updating several tables and deleting the original rows

  • So I know there must be a better way to do this, because I am most definitely tired of doing it manually. 🙂

    I have been given the task of cleaning up a table full of duplicates where all column data is the same but the primary key is different. However, before I can delete the dups I must update several tables with one of the primary keys (doesn't matter which one, the TOP one is fine) and then delete the rest in the original table. The setup is like this:

    @MainTable

    ID - PK

    Col1

    Col2

    Col3

    @Table1

    TID - PK

    ID - FK

    @Table2

    T2ID - PK

    ID - FK

    @Table3

    T3ID - PK

    ID - FK

    So essentially I need to update any duplicate ID in @Table1, @Table2, and @Table3 with just one ID from @MainTable. Then delete the duplicate ID from @MainTable. Sample Data:

    declare @MainTable table(ID int, col1 int, col2 int, col3 int);

    declare @Table1 table(TID int, ID int);

    declare @Table2 table(T2ID int, ID int);

    declare @Table3 table(T3ID int, ID int);

    insert into @MainTable(ID,col1,col2,col3)

    select 1,22,33,44

    union all select 2,22,33,44

    union all select 4,55,66,77

    union all select 5,55,66,77

    union all select 6,55,66,77

    union all select 7,88,99,100

    union all select 8,88,99,100

    union all select 9,88,99,100

    union all select 10,88,99,100;

    insert into @Table1(TID,ID)

    select 1,2

    union all select 2,5

    union all select 3,6

    union all select 4,8

    union all select 5,9;

    insert into @Table2(T2ID,ID)

    select 1,2

    union all select 2,4

    union all select 3,5

    union all select 4,8

    union all select 5,9;

    insert into @Table3(T3ID,ID)

    select 1,2

    union all select 2,5

    union all select 3,6

    union all select 4,8

    union all select 5,10;

    --to review data

    select * from @MainTable;

    select * from @Table1;

    select * from @Table2;

    select * from @Table3;

    --update tables

    update @Table1

    set ID = 1

    where ID in (1,2);

    update @Table2

    set ID = 1

    where ID in (1,2);

    update @Table3

    set ID = 1

    where in (1,2);

    Delete from @MainTable

    where ID = 2;

    --next group of IDs

    update @Table1

    set ID = 4

    where ID in (4,5,6);

    update @Table2

    set ID = 4

    where ID in (4,5,6);

    update @Table3

    set ID = 4

    where ID in (4,5,6);

    delete from @MainTable

    where ID in (5,6);

    ...and so on and so forth. The result would be all tables updated with the IDs below and the @MainTable looking like this:

    ID col1 col2 col3

    1 22 33 44

    4 55 66 77

    7 88 99 100

    I am hoping to not have to use a cursor :crying:

    Thanks for any help in advance!

    -stephen

  • Before I post my solution: You did an EXCELLENT JOB describing the scenario and providing ready to use sample data! There's nothing better to ask for! GREAT!!

    And here's what I came up with:

    ;WITH cte AS

    (

    SELECT col1,col2,col3,MIN(id) min_id -- find the first id per group of identical columns

    FROM @MainTable

    GROUP BY col1,col2,col3

    )

    SELECT m.*,cte.min_id

    INTO tmp -- build a staging table to hold the current and the min id where current <> min

    FROM @MainTable m

    INNER JOIN cte ON m.col1=cte.col1 AND m.col2=cte.col2 AND m.col3=cte.col3

    AND m.id<>cte.min_id

    SELECT *

    FROM tmp

    -- update all values in all three tables

    UPDATE @Table1

    SET ID = tmp.min_id

    FROM @Table1 t1 INNER JOIN tmp ON t1.id=tmp.id

    UPDATE @Table2

    SET ID = tmp.min_id

    FROM @Table2 t2 INNER JOIN tmp ON t2.id=tmp.id

    UPDATE @Table3

    SET ID = tmp.min_id

    FROM @Table3 t3 INNER JOIN tmp ON t3.id=tmp.id

    -- delete from the original table

    DELETE m FROM

    @MainTable m

    INNER JOIN tmp ON m.id=tmp.id

    SELECT * FROM @MainTable;

    SELECT * FROM @Table1;

    SELECT * FROM @Table2;

    SELECT * FROM @Table3;

    -- cleanup

    DROP TABLE tmp

    Depending on the number of rows you need to deal with it might be a good idea to add an index to the temp table on the id column.

    If you have any question regarding the code, please ask.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • This was exactly what I needed! Your sytax and comments were very clear, and this will definitely help me in the future.

    I have never used a CTE before, but after doing a little research I definitely see the benefits. At best, I could be considered a junior database developer, so my knowledge of SQL Server is in its infancy. I have been using SQL Server for about 10 months now, and am hoping to get into SSAS and the rest of the Business Intelligence package.

    If you don't mind me asking, do you have any suggestions on furthering my SQL server knowledge? It has been nice learning on the job, but I really want to take it to the next level (SSAS 😀 of course). I have bought books, but I have a hard time grasping the information (go figure, technical books that are boring!).

    Thanks again for the help,

    -Stephen

  • The first thing would be to get a SQL Server Express edition (version similar to what you use at work) and install it at home or wherever you're planning to study together with the AdventureWorks database.

    If the books you're reading include sample code, try to run it in your test environment and understand how it works. If you need sample codes to study on, there's one great resource: SQLServerCentral.com 😀

    Follow some of the threads you're interested in and try to solve the "puzzles" by yourself (before looking at the solutions posted). Compare your solution and the ones posted and try to find why those behave like they do. If you have questions, just ask (include either the sample code you're dealing with or a link to a related thread/post, so we know what you're working on).

    Regarding SSAS: haven't used it (yet). Maybe others will post some resources.

    In General, you could go to the main page of SQLServerCentral and have a look at the Stairways series. There might be some helpful articles.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Thanks for the advice!

    I am going to try some of the free webinars from Pragmatic Works. I learn more effectively through demos and hands-on.

    Also, I will start to participate on the site more often. It has always been useful and seems to be very helpful and friendly.

    -stephen

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply