Given this table structure:
Table A ID AGE EDUCATION 1 23 3 2 25 6 3 22 5 Table B ID AGE EDUCATION 1 26 4 2 24 6 3 21 3
I want to find all matches between the two tables where the age is within 2 and the education is within 2. However, I do not want to select any row from TableB more than once. Each row in B should be selected 0 or 1 times and each row in A should be selected one or more times (standard left join).
SELECT * FROM TableA as A LEFT JOIN TableB as B ON abs(A.age - B.age) <= 2 AND abs(A.education - B.education) <= 2 A.ID A.AGE A.EDUCATION B.ID B.AGE B.EDUCATION 1 23 3 3 21 3 2 25 6 1 26 4 2 25 6 2 24 6 3 22 5 2 24 6 3 22 5 3 21 3
As you can see, the last two rows in the output have duplicated B.ID of 2 and 3 when compared to the entire result set. I'd like those rows to return as a single null match with A.ID = 3 since they were both matched to previous A values.
(note that for A.ID = 3, there is no match in B because all rows in B have already been joined to rows in A.)
A.ID A.AGE A.EDUCATION B.ID B.AGE B.EDUCATION 1 23 3 3 21 3 2 25 6 1 26 4 2 25 6 2 24 6 3 22 5 null null null
I can do this with a short program, but I'd like to solve the problem using a SQL query because it is not for me and I will not have the luxury of ever seeing the data or manipulating the environment.
Any ideas? Thankssqlleft-join
SELECT DISTINCT A.id, A.age, A.education, B.age, B.education FROM TableA as A LEFT JOIN TableB as B ON abs(A.age - B.age) <= 2 AND abs(A.education - B.education) <= 2
To my knowledge something like this is not possible with a simple select statement and joins because you need to know what has already been selected in order to eliminate duplicates.
You can however try something a little more like this:
DECLARE @JoinResults TABLE (A_ID INT, A_Age INT, A_Education INT, B_ID INT, B_Age INT, B_Education INT) INSERT INTO @JoinResults (A_ID, A_Age, A_Education) SELECT ID, AGE, EDUCATION FROM TableA DECLARE @i INT SET @i = 1 --Assume that A_ID is incremental and no values missed WHILE (@i < (SELECT Max(A_ID) FROM @JoinResults BEGIN UPDATE @JoinResult SET B_ID = SQ.ID, B_Age = SQ.AGE, B_Education = SQ.Education FROM ( SELECT ID, AGE, EDUCATION FROM TableB b WHERE ( abs((SELECT A_Age FROM @JoinResult WHERE A_Id = @i) - AGE) <=2 AND abs((SELECT A_Education FROM @JoinResult WHERE A_Id = @i) - EDUCATION) <=2 ) AND (SELECT B_ID FROM @JoinResults WHERE B_ID = b.id) IS NULL ) AS SQ SET @i = @i + 1 END SELECT @JoinResults
NOTE: I do not currently have access to a database so this is untested and I am weary of 2 potential issues with it
If these issues do arise let me know and I can help troubleshoot.
As @Joel Coehoorn said earlier, there has to be a mechanism that selects which pairs of (a,b) with the same (b) are filtered out from the output. SQL is not great on allowing you to select ONE row when multiple match, so a pivot query needs to be created, where you filter out the records you don't want. In this case, filtering can be done by reducing all of match IDs of B as a smallest (or largest, it doesn't really matter), using any function that will return one value from a set, it's just min() and max() are most convenient to use. Once you reduced the result to know which (a,b) pairs you care about, then you join against that result, to pull out the rest of the table data.
select a.id a_id, a.age a_age, a.education a_e, b.id b_id, b.age b_age, b.education b_e from a left join ( SELECT a.id a_id, min(b.id) b_id from a,b where abs(A.age - B.age) <= 2 AND abs(A.education - B.education) <= 2 group by a.id ) g on a.id = g.a_id left join b on b.id = g.b_id;
In SQL-Server, you can use the
CROSS APPLY syntax:
SELECT a.id, a.age, a.education, b.id AS b_id, b.age AS b_age, b.education AS b_education FROM tableB AS b CROSS APPLY ( SELECT TOP (1) a.* FROM tableA AS a WHERE ABS(a.age - b.age) <= 2 AND ABS(a.education - b.education) <= 2 ORDER BY a.id -- your choice here ) AS a ;
Depending on the order you choose in the subquery, different rows from
tableA will be selected.
Edit (after your update): But the above query will not show rows from A that have no matching rows in B or even some that have but not been selected.
It could also be done with window functions but Access does not have them. Here is a query that I think will work in Access:
SELECT a.id, a.age, a.education, s.id AS s_id, s.age AS b_age, s.education AS b_education FROM tableB AS a LEFT JOIN ( SELECT b.id, b.age, b.education, MIN(a.id) AS a_id FROM tableB AS b JOIN tableA AS a ON ABS(a.age - b.age) <= 2 AND ABS(a.education - b.education) <= 2 GROUP BY b.id, b.age, b.education ) AS s ON a.id = s.a_id ;
I'm not sure if Access allows such a subquery but if it doesn't, you can define it as a "Query" and then use it in another.