Is it possible to left join two tables and have the right table supply each row no more than once?

haventchecked Source

Given this table structure:

Table A
ID    AGE    EDUCATION
1     23     3
2     25     6
3     22     5

Table B
ID    AGE    EDUCATION
1     26     4
2     24     6
3     21     3

I want to find all matches between the two tables where the age is within 2 and the education is within 2. However, I do not want to select any row from TableB more than once. Each row in B should be selected 0 or 1 times and each row in A should be selected one or more times (standard left join).

SELECT *
FROM TableA as A LEFT JOIN TableB as B ON 
    abs(A.age - B.age) <= 2 AND 
    abs(A.education - B.education) <= 2

A.ID    A.AGE    A.EDUCATION    B.ID    B.AGE   B.EDUCATION
1       23       3              3       21      3
2       25       6              1       26      4
2       25       6              2       24      6
3       22       5              2       24      6
3       22       5              3       21      3

As you can see, the last two rows in the output have duplicated B.ID of 2 and 3 when compared to the entire result set. I'd like those rows to return as a single null match with A.ID = 3 since they were both matched to previous A values.

Desired output:

(note that for A.ID = 3, there is no match in B because all rows in B have already been joined to rows in A.)

A.ID    A.AGE    A.EDUCATION    B.ID    B.AGE   B.EDUCATION
1       23       3              3       21      3
2       25       6              1       26      4
2       25       6              2       24      6
3       22       5              null    null    null

I can do this with a short program, but I'd like to solve the problem using a SQL query because it is not for me and I will not have the luxury of ever seeing the data or manipulating the environment.

Any ideas? Thanks

sqlleft-join

Answers

answered 5 years ago Pablo Díaz Ogni #1

Use SELECT DISTINCT

SELECT DISTINCT A.id, A.age, A.education, B.age, B.education 
FROM TableA as A LEFT JOIN TableB as B ON 
    abs(A.age - B.age) <= 2 AND 
    abs(A.education - B.education) <= 2

answered 5 years ago Phaeze #2

To my knowledge something like this is not possible with a simple select statement and joins because you need to know what has already been selected in order to eliminate duplicates.

You can however try something a little more like this:

DECLARE @JoinResults TABLE
(A_ID INT, A_Age INT, A_Education INT, B_ID INT, B_Age INT, B_Education INT)

INSERT INTO @JoinResults (A_ID, A_Age, A_Education)
SELECT ID, AGE, EDUCATION
FROM TableA

DECLARE @i INT
SET @i = 1
--Assume that A_ID is incremental and no values missed
WHILE (@i < (SELECT Max(A_ID) FROM @JoinResults
BEGIN
    UPDATE @JoinResult
    SET B_ID = SQ.ID,
        B_Age = SQ.AGE,
        B_Education = SQ.Education
    FROM (
        SELECT ID, AGE, EDUCATION
        FROM TableB b
        WHERE (
            abs((SELECT A_Age FROM @JoinResult WHERE A_Id = @i) - AGE) <=2
            AND abs((SELECT A_Education FROM @JoinResult WHERE A_Id = @i) - EDUCATION) <=2
        ) AND (SELECT B_ID FROM @JoinResults WHERE B_ID = b.id) IS NULL
    ) AS SQ 

    SET @i = @i + 1
END

SELECT @JoinResults

NOTE: I do not currently have access to a database so this is untested and I am weary of 2 potential issues with it

  1. I am not sure how the update will react if there are no results
  2. I am unsure if the IS NULL check is correct to eliminate the duplicates.

If these issues do arise let me know and I can help troubleshoot.

answered 5 years ago Pawel Veselov #3

As @Joel Coehoorn said earlier, there has to be a mechanism that selects which pairs of (a,b) with the same (b) are filtered out from the output. SQL is not great on allowing you to select ONE row when multiple match, so a pivot query needs to be created, where you filter out the records you don't want. In this case, filtering can be done by reducing all of match IDs of B as a smallest (or largest, it doesn't really matter), using any function that will return one value from a set, it's just min() and max() are most convenient to use. Once you reduced the result to know which (a,b) pairs you care about, then you join against that result, to pull out the rest of the table data.

select a.id a_id, a.age a_age, a.education a_e,
b.id b_id, b.age b_age, b.education b_e
from a left join
(
SELECT   
  a.id a_id, min(b.id) b_id from a,b where 
  abs(A.age - B.age) <= 2 AND 
  abs(A.education - B.education) <= 2
  group by a.id
) g on a.id = g.a_id
left join b on b.id = g.b_id;

answered 5 years ago ypercubeᵀᴹ #4

In SQL-Server, you can use the CROSS APPLY syntax:

SELECT
    a.id, a.age, a.education, 
    b.id AS b_id, b.age AS b_age, b.education AS b_education
FROM tableB AS b
  CROSS APPLY
    ( SELECT TOP (1) a.*
      FROM tableA AS a
      WHERE ABS(a.age - b.age) <= 2
        AND ABS(a.education - b.education) <= 2
      ORDER BY a.id                                    -- your choice here
    ) AS a ;

Depending on the order you choose in the subquery, different rows from tableA will be selected.

Edit (after your update): But the above query will not show rows from A that have no matching rows in B or even some that have but not been selected.


It could also be done with window functions but Access does not have them. Here is a query that I think will work in Access:

SELECT
    a.id, a.age, a.education,
    s.id AS s_id, s.age AS b_age, s.education AS b_education
FROM tableB AS a
  LEFT JOIN
    ( SELECT
          b.id, b.age, b.education, MIN(a.id) AS a_id
      FROM tableB AS b
        JOIN tableA AS a
          ON  ABS(a.age - b.age) <= 2
          AND ABS(a.education - b.education) <= 2
      GROUP BY b.id, b.age, b.education
    ) AS s
    ON a.id = s.a_id ;

I'm not sure if Access allows such a subquery but if it doesn't, you can define it as a "Query" and then use it in another.

comments powered by Disqus