Select unique records between several tables

  • I am looking  for an example on how to query several tables, around 50 tables, to find all unique records between them. I cant find anything other than a join on 3 tables.

  • This almost worked <g>.

    DECLARE @tableName NVARCHAR(50);
    DECLARE @TemplateSQL NVARCHAR(500)=
    'SELECT *
    INTO #tempData
    FROM dbo.MikeyData n
    EXCEPT
    SELECT *
    FROM #tempData d;';

    DECLARE @InsertSQL NVARCHAR(500);

    DECLARE tables_cursor CURSOR FAST_FORWARD
    FOR
    SELECT ao.name
    FROM sys.all_objects ao
    WHERE ao.name LIKE '%Mikey%'
    AND ao.type_desc = 'USER_TABLE';

    -- make sure the destination table is empty!
    --DROP TABLE IF EXISTS #tempData;

    OPEN tables_cursor;

    FETCH NEXT FROM tables_cursor INTO @tableName;

    WHILE @@FETCH_STATUS = 0
    BEGIN
    -- modify the SQL statement to point to the table
    SET @InsertSQL = REPLACE(@TemplateSQL,'MikeyData',@tableName);

    -- make sure the SQL statement makes sense
    PRINT @InsertSQL;
    PRINT '';

    -- actually execute the SQL statement.
    EXECUTE sp_executesql @InsertSQL;

    FETCH NEXT FROM tables_cursor INTO @tableName;
    END

    CLOSE tables_cursor;
    DEALLOCATE tables_cursor;

    SELECT *
    FROM #tempData;

    When I ran it, this is the error output:

    When I ran it, this is the error output:
    SELECT *
    INTO #tempData
    FROM dbo.MikeyData n
    EXCEPT
    SELECT *
    FROM #tempData d;

    Msg 2770, Level 16, State 1, Line 6
    The SELECT INTO statement cannot have same source and destination tables.

    Not sure how to solve that part, because unless I know the structure of the source tables, how do I return the missing records, since it doesn't seem to like EXCEPT? (Or am I just a dolt, and I don't understand what the error really means?)

  • Unless these are relatively small tables, this is going to be a problem. There isn't a way to do this across tables easily.

    You can insert data from a table into a temp location, then use merge to get data from other tables, but it's a serial, row by row, or table by table, operation.

    Can I ask what kind of data you have in these tables that is spread out and needs uniqueness?

  • a_car11 - Just so I understand the question correctly, you want to get all of the records that ONLY exist in 1 of the 50 tables, correct?  I just want to make sure that you are meaning you want the unique record set and not a distinct record set.

    For simplicity sake, I'm going to assume the 50 different tables all have 1 column called "name".  And to explain what i mean better, I am going to reduce the scope down to 3 tables (as we can extrapolate).

    Table A contains:

    INSERT INTO A (name)
    VALUES ('a'),
    ('b')

    INSERT INTO B (name)
    VALUES ('a'),
    ('c')

    INSERT INTO C (name)
    VALUES ('d'),
    ('e')

    If we are going with a UNIQUE set, I would expect to get the results 'b', 'c', 'd', and 'e'.  If we are going with a DISTINCT set, I would expect to get 'a', 'b', 'c', 'd', 'e'.  I can think of solutions to both, with the DISTINCT set being a much easier solution - UNION all tables (not UNION ALL, but UNION) and you will have a distinct list.  Now for unique, things get more complicated, but I think my approach would be something like doing a UNION ALL on all of the tables first so you end up with the full data set, then do a count(1) OVER (PARTITION BY <all columns>) and finally, SELECT where that COUNT is 1.  If the tables are large OR have a lot of columns, this could be slow and/or messy... or I could be completely misunderstanding the problem?  using my A,B,C example above:

    --DISTINCT
    SELECT name
    FROM A
    UNION
    SELECT name
    FROM B
    UNION
    SELECT name
    FROM C

    --UNIQUE
    WITH cte AS (
    SELECT name, COUNT(1) OVER (PARTITION BY name) AS DuplicateCount
    FROM (SELECT name
    FROM A
    UNION
    SELECT name
    FROM B
    UNION
    SELECT name
    FROM C) data
    )
    SELECT *
    FROM cte
    WHERE DuplciateCount = 1

    Gets messy as the more tables you add in, the longer that query gets.  And the larger those tables are, the slower the query will be.  It also requires all 50 of the tables to have the same structure (or at least a way that you can do a UNION on them)...

    The code above is 100% untested, so I may have syntax errors in there, but I think it gives some direction...

    Mind you, I may be misunderstanding the problem...

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • a_car11 wrote:

    I am looking  for an example on how to query several tables, around 50 tables, to find all unique records between them. I cant find anything other than a join on 3 tables.

    a) What do you expect as the result of this query?

    b) What will you do with the result?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi there!

    I know my question was some how abstract but trying to simplifying it:

    a) What do you expect as the result of this query?

    I have one table, lets called it master, that has several records with IDs, account numbers and other data in it. Now I have about 40 other tables that has IDs, account numbers and other data as well. They are related from the IDS from the master table to the rest of the tables.

    b) What will you do with the result?

    What I am trying to do is to find some records the are in the master as well  in all the other 40 tables. If  ID, lets say 0011, account 3333 is is the master table, it should also be in all the others.

    WITH d_master AS (

    SELECT DISTINCT id, account_number

    ROW_NUMBER() OVER (ORDER BY id desc) id_numb FROM master

    ), t02 AS (

    SELECT DISTINCT id as id2, account_number as acc2,

    ROW_NUMBER() OVER (ORDER BY id desc) id_numb FROM t_02

    )

    , t03 AS (

    SELECT DISTINCT id as id3, account_number as acc3,

    ROW_NUMBER() OVER (ORDER BY id desc) id_numb FROM t_03

    )

    , t04 AS (

    SELECT DISTINCT id as id4, account_number as acc4

    ROW_NUMBER() OVER (ORDER BY id desc) id_numb FROM t_04

    )

    , t05 AS (

    SELECT DISTINCT id as id5, account_number as acc5,

    ROW_NUMBER() OVER (ORDER BY id desc) id_numb FROM t_05

    )

    , t06 AS (

    SELECT DISTINCT id as id6, account_number as acc6,

    ROW_NUMBER() OVER (ORDER BY id desc) id_numb FROM t_06

    )

    , t07 AS (

    SELECT DISTINCT id as id7, account_number as acc7,

    ROW_NUMBER() OVER (ORDER BY id desc) id_numb FROM t_07

    )

    -- ... add more tables here up to around 35 more

    SELECT id, account_number

    FROM d_master

    inner JOIN t02

    ON d_master.id_numb = t02.id_numb

    inner JOIN t03

    ON t02.id_numb = t03.id_numb

    inner JOIN t04

    ON t03.id_numb = t04.id_numb

    inner JOIN t05

    ON t04.id_numb = t05.id_numb

    inner JOIN t06

    ON t05.id_numb = t06.id_numb

    inner JOIN t07

    ON t06.id_numb = 707.id_numb

    -- ... more joins here for all the rest of the table, about 35 more.

    This is the approach I am taking but got stuck on this:

    Where this query above should return no data if any of these JOIN(s) would fail by not matching any results, like I tested with one "JOIN" only,  on these two tables:

    SELECT DISTINCT a.id, b.id

    FROM t03 a

    inner join t04 b

    on a.id = b.id

    And for the time being , they have nothing in common, but the whole query still returns data back, I wonder why, when an inner join is being applied.

    That's  why I am trying to see if there is a better way to accomplish  this task!

    I hope I didn't  confuse this even more!

    Thanks for looking!

  • You could do something like this,

     

    CREATE TABLE #ALL_TABLE_KEYS(TABLE_NAME varchar(500),  varchar(30))

    SELECT 'INSERT INTO #ALL_TABLE_KEYS SELECT ''' + TABLE_SCHEMA + '.' + TABLE_NAME + ''', <YOUR KEY> FROM ' + TABLE_SCHEMA + '.' + TABLE_NAME + ';' FROM OFS.INFORMATION_SCHEMA.COLUMNS
    WHERE COLUMN_NAME = '<YOUR KEY>'
    GROUP BY 'INSERT INTO #ALL_TABLE_KEYS SELECT ''' + TABLE_SCHEMA + '.' + TABLE_NAME + ''', <YOUR KEY> FROM ' + TABLE_SCHEMA + '.' + TABLE_NAME

    Now run those inserts, then you can join your master table back to the temp table and and check number of tables that the key is in.

  • Another approach would be to go back to your query where you have all of those CTE's and UNION all of the data from the non-master table in your second CTE and INNER JOIN the 2 CTE's?  Mind you, I don't think you need the first CTE, just use the master table directly.

    I'm thinking something like:

    WTIH t_data AS 
    (SELECT DISTINCT id as id, account_number as acc
    FROM t_02
    UNION
    SELECT DISTINCT id as id2, account_number as acc2,
    FROM t_03
    UNION
    SELECT DISTINCT id as id2, account_number as acc2,
    FROM t_04
    -- and so on until the last table
    )
    SELECT ID, Account_Number
    FROM master
    JOIN t_data ON t_data.id = master.id

    Also, looking at your query, the reason that you got results when you expected none is because of that ROW_NUMBER() thing you did.  If ROW_NUMBER() returns at LEAST 1 value, it is going to be 1, so as long as each table returns at least 1 value, you are going to have things matching up.  The table could have completely different "ID" values values (for example, master could have a min ID of 5 while t_02 has a min ID of 50, and ROW_NUMBER() for both of these would be 1) and you are joining it on the ROW_NUMBER(), not on the ID.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Hi the idea is good, but running into some issues with your query;

    WTIH t_data AS

    (SELECT DISTINCT id as id, account_number as acc

    FROM t_02

    UNION

    SELECT DISTINCT id as id2, account_number as acc2

    FROM t_03

    UNION

    SELECT DISTINCT id as id3, account_number as acc3

    FROM t_04

    -- and so on until the last table

    )

    SELECT ID, Account_Number

    FROM master

    JOIN t_data ON t_data.id = master.id

    I get "Ambiguous column name 'id'."

    Should this line be like:

    SELECT DISTINCT id as id1, account_number as acc1

    instead of:

    SELECT DISTINCT id as id, account_number as acc

    So:

    WITH t_data AS

    (SELECT DISTINCT id as id1, account_number as acc1

    FROM t_02

    UNION

    SELECT DISTINCT id as id2, account_number as acc2

    FROM t_03

    UNION

    SELECT DISTINCT id as id3, account_number as acc3

    FROM t_04

    -- and so on until the last table

    )

    SELECT id, account_number

    FROM master

    JOIN t_data ON t_data.id1 = master.id

    Wondering on the results, cause I am only getting the join results from the first SELECT in the main query off  t_data?

    SELECT DISTINCT id as id1, account_number as acc1

    Thanks

     

  • Is there a pattern to the actual name of the tables and are all of the tables in the same database?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm not sure if I understand what you want. Do you want to find all rows in the master table that have rows in all the other 50 related table?

    If so this will do it:

    SELECT *
    FROM dbo.Master m
    WHERE EXISTS(SELECT * FROM dbo.Table1 t WHERE t.MasterId = m.Id)
    AND EXISTS(SELECT * FROM dbo.Table2 t WHERE t.MasterId = m.Id)
    AND EXISTS(SELECT * FROM dbo.Table3 t WHERE t.MasterId = m.Id)
    AND EXISTS(SELECT * FROM dbo.Table4 t WHERE t.MasterId = m.Id)
    ...
    AND EXISTS(SELECT * FROM dbo.Table49 t WHERE t.MasterId = m.Id)
    AND EXISTS(SELECT * FROM dbo.Table50 t WHERE t.MasterId = m.Id)

     

  • This was removed by the editor as SPAM

  • I'm assuming that your tables are NOT named T01. T02, T03, etc and so I'm asking the following question again...

    Jeff Moden wrote:

    Is there a pattern to the actual name of the tables and are all of the tables in the same database?

    If they are, this is not such a difficult task but I need to know what the pattern of table names is first.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi, they are not, I  have one table called master, the others are named like acc_ and some numbers and letters after that. All I am trying to do is to find one account number in the master table that is also in all the others, about 50 tables. Appreciate your response.

  • a_car11 wrote:

    Hi, they are not, I  have one table called master, the others are named like acc_ and some numbers and letters after that. All I am trying to do is to find one account number in the master table that is also in all the others, about 50 tables. Appreciate your response.

    Other than the table named "master", are all tables that "named like acc_ and some numbers and letters after that" a part of what you described in this post or are there other tables that look like acc_% that should be excluded?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply