Finding Available Keys When Merging Two Tables

  • Comments posted to this topic are about the item Finding Available Keys When Merging Two Tables

  • Hello,

    I have to say that while the procedure described leads to correct result, it seems to be lacking certain ingenuity so often found in the articles here. In my opinion, this is how one would do the task without taking too much time to think about various possibilities. Nothing wrong with that, if you just need to do it once, but I wouldn't publish it in an article.

    To be more precise, the article does several actions where there could be just one (e.g., insert all, delete those that are in first table, delete those that are in second table). Why not insert just those unused outright in the first place?

    When filling the table with data, various loops and even import from Excel are mentioned - but not a hint that there is way to do it in set-based SQL.

    The article does not contain INSERT INTO statements, that would enable readers to create the same environment easily. Also, it just mentions how to find available keys, not how to apply the result and get new ID's assigned correctly.

    This is how I would go about removing duplicates in a similar case (for simplicity, I used temporary tables instead of your permanent tables in 2 databases). You will observe, that I have used a table "numbers" which is not defined in the script. This is a one-column table, also known as "Tally" table, and you can find a lot about it and its uses here on the forums (also how to create and fill it, therefore I won't repeat that, for the sake of simplicity).

    IF object_id('tempdb..#emp_one') IS NOT NULL DROP TABLE #emp_one

    IF object_id('tempdb..#emp_two') IS NOT NULL DROP TABLE #emp_two

    IF object_id('tempdb..#emp_free_no') IS NOT NULL DROP TABLE #emp_free_no

    IF object_id('tempdb..#duplicates') IS NOT NULL DROP TABLE #duplicates

    GO

    /*create test tables*/

    CREATE TABLE #emp_one(emp_no VARCHAR(4))

    CREATE TABLE #emp_two(emp_no VARCHAR(4))

    /*insert sample data*/

    INSERT INTO #emp_one

    SELECT '0001'

    UNION

    SELECT '0003'

    UNION

    SELECT '0007'

    UNION

    SELECT '0008'

    UNION

    SELECT '0009'

    INSERT INTO #emp_two

    SELECT '0001'

    UNION

    SELECT '0004'

    UNION

    SELECT '0005'

    UNION

    SELECT '0006'

    UNION

    SELECT '0007'

    /*prepare table emp_two for renumbering - add column that will hold the new employee number*/

    ALTER TABLE #emp_two ADD new_emp_no VARCHAR(4)

    GO

    /*table of available (unused) employee numbers*/

    CREATE TABLE #emp_free_no(id INT IDENTITY, emp_no VARCHAR(4))

    INSERT INTO #emp_free_no (emp_no)

    SELECT Q.emp_no

    FROM

    (SELECT RIGHT('000'+ CAST(number as VARCHAR(4)),4) as emp_no

    FROM numbers

    WHERE number > 0 and number < 10000) as Q

    LEFT JOIN #emp_one e1 ON e1.emp_no=Q.emp_no

    LEFT JOIN #emp_two e2 ON e2.emp_no=Q.emp_no

    WHERE e1.emp_no IS NULL AND e2.emp_no IS NULL

    /*table of employee numbers that need to be replaced - exist in both tables

    these numbers will remain in table emp_one and will be changed in table emp_two*/

    CREATE TABLE #duplicates (dupid INT IDENTITY, emp_no VARCHAR(4))

    INSERT INTO #duplicates

    SELECT e2.emp_no

    FROM #emp_two e2

    JOIN #emp_one e1 ON e1.emp_no=e2.emp_no

    /*write the new employee number*/

    UPDATE e2

    SET new_emp_no=ISNULL(f.emp_no,e2.emp_no)

    FROM #emp_two e2

    LEFT JOIN #duplicates d ON d.emp_no = e2.emp_no

    LEFT JOIN #emp_free_no f ON f.id=d.dupid

    /*show result*/

    SELECT * FROM #emp_two

    You can now move the rows from table emp_two to table emp_one, and use new_emp_no as key when inserting into table emp_one.

  • Vladan,

    Thanks for taking the time to create a different approach. I will research the "Number" table solution, as I am finding that I am using something similar in my latest scripts.

    I also want to point out something very clever in your loading script:

    SELECT RIGHT('000'+ CAST(number as VARCHAR(4)),4) as emp_no...

    One reason I was using Excel to generate my set of test records was that I could create a formula to preface the 000's on the keys. I will be using the statement you have above from here on out to do the same thing!

    You noted that there is no mention of how to apply the new keys. You address this in your summary of inserting into a new table and using the new employee id from table two, which is a good solution. In my case, I was taking the new unused ids and using them in a conversion program written in Unibasic on a Unidata database. The Unidata tables were the source tables, and then DTS was bringing a copy of the tables to SQL Server, so the data had to be corrected in the underlying Unidata tables, but it was easier to identify the unused keys in SQL Server.

  • I have to agree with Vladan on this one.

    The approach pointed out in the article is rather cumbersome, even for a one time thing.

    Whenever one needs to compare data between tables, the first thing that should come to mind is some form of join between the tables in question.

    metawizard2 nice reply to Vladan, it's always nice to see people who can take criticism and learn and grow from it.

  • I think that this illustrates one of the fundamental problems that procedural / OO programmers have working with SQL: thinking in sets is not intuitive when you are used to thinking in rows.

    The approach in the article comes from a very "Row" place ( or as Jeff Moden refers to as "RBAR" - row by agonizing row http://www.sqlservercentral.com/articles/TSQL/62867/ ).

    After playing around with Vladan's solution, I can see how it is a set solution, but I am still having problems "thinking in sets".

    Does anyone have any advice on making that transition to sets?

  • I am primarily a developer but love playing in SQL as well, so I can associate to some degree with you.

    For me the more I played in Query Analyzer trying to consolidate my code the more I began thinking in set based terms. I love less code, the more consolidated it is the more I like it. Most of the time consolidating the code makes it more set based ( not all the time though ).

    Subscribing to newletters like SSC have helped to immerse myself in the technology. I am still very guilty of making mistakes and writing poor SQL code, but I get better with each mistake.

    Learn to write selects, the more complicated the better, that taught me most of the syntax and method of thinking that I needed. Then later when I needed to manipulate the data further I was able to take my basic selects and tweak them to perform updates, deletes, and anything else I needed.

  • Good approach, I think... my question would be, how do you find and change all the references to such a table? Now THAT would make for an interesting article. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It has been a long journey who is nearing its end. I will tell the tale in an upcoming article! 😉

  • Kudos to the author. First because the author found a solution to a problem. See to get a solution sometimes is a problem, but here the author got a solution which may not be the best. Now the author can do research to find a more smarter way of doing it.

    Second the author shared with us the solution.

    Third the author is very sporting to accept criticism.

    Thanks......:)

  • metawizard2,

    now that you explained your background, I can understand better why you chose the way you described... and I'm glad you started to explore the set-based approach. If you'll work with SQL Server, you will find it opens lots of possibilities how to manage data efficiently. As long as you work with a few thousand rows, the difference is not crucial - but once you have tables with hundreds of thousands or millions of rows, row-by-row processing is absolute disaster. Converting a procedure to set-based logic can cut the execution time to a small fraction of the original time (like from several hours to a few minutes).

    Good luck in your learning process, I'm sure you'll find almost everything you need on this site - and if you miss or don't understand something, just post your question and we will be happy to assist you! Looking forward to your articles and posts,

    Vladan

  • 😉 Thanks for all the great feedback. You have made this first article a really great experience!

  • Good approach, I think... my question would be, how do you find and change all the references to such a table?

    Trust you to think of that one :rolleyes:

    Now THAT would make for an interesting article

    It will, when you write it Jeff :hehe:

    Far away is close at hand in the images of elsewhere.
    Anon.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply