Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Finding Available Keys When Merging Two Tables Expand / Collapse
Author
Message
Posted Saturday, May 24, 2008 2:08 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Tuesday, July 5, 2011 7:37 PM
Points: 58, Visits: 149
Comments posted to this topic are about the item Finding Available Keys When Merging Two Tables


Post #506263
Posted Monday, May 26, 2008 4:08 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Thursday, May 5, 2011 1:38 AM
Points: 1,636, Visits: 604
Hello,
I have to say that while the procedure described leads to correct result, it seems to be lacking certain ingenuity so often found in the articles here. In my opinion, this is how one would do the task without taking too much time to think about various possibilities. Nothing wrong with that, if you just need to do it once, but I wouldn't publish it in an article.

To be more precise, the article does several actions where there could be just one (e.g., insert all, delete those that are in first table, delete those that are in second table). Why not insert just those unused outright in the first place?
When filling the table with data, various loops and even import from Excel are mentioned - but not a hint that there is way to do it in set-based SQL.
The article does not contain INSERT INTO statements, that would enable readers to create the same environment easily. Also, it just mentions how to find available keys, not how to apply the result and get new ID's assigned correctly.

This is how I would go about removing duplicates in a similar case (for simplicity, I used temporary tables instead of your permanent tables in 2 databases). You will observe, that I have used a table "numbers" which is not defined in the script. This is a one-column table, also known as "Tally" table, and you can find a lot about it and its uses here on the forums (also how to create and fill it, therefore I won't repeat that, for the sake of simplicity).

IF object_id('tempdb..#emp_one') IS NOT NULL DROP TABLE #emp_one
IF object_id('tempdb..#emp_two') IS NOT NULL DROP TABLE #emp_two
IF object_id('tempdb..#emp_free_no') IS NOT NULL DROP TABLE #emp_free_no
IF object_id('tempdb..#duplicates') IS NOT NULL DROP TABLE #duplicates
GO
/*create test tables*/
CREATE TABLE #emp_one(emp_no VARCHAR(4))
CREATE TABLE #emp_two(emp_no VARCHAR(4))

/*insert sample data*/
INSERT INTO #emp_one
SELECT '0001'
UNION
SELECT '0003'
UNION
SELECT '0007'
UNION
SELECT '0008'
UNION
SELECT '0009'

INSERT INTO #emp_two
SELECT '0001'
UNION
SELECT '0004'
UNION
SELECT '0005'
UNION
SELECT '0006'
UNION
SELECT '0007'

/*prepare table emp_two for renumbering - add column that will hold the new employee number*/
ALTER TABLE #emp_two ADD new_emp_no VARCHAR(4)
GO

/*table of available (unused) employee numbers*/
CREATE TABLE #emp_free_no(id INT IDENTITY, emp_no VARCHAR(4))
INSERT INTO #emp_free_no (emp_no)
SELECT Q.emp_no
FROM
(SELECT RIGHT('000'+ CAST(number as VARCHAR(4)),4) as emp_no
FROM numbers
WHERE number > 0 and number < 10000) as Q
LEFT JOIN #emp_one e1 ON e1.emp_no=Q.emp_no
LEFT JOIN #emp_two e2 ON e2.emp_no=Q.emp_no
WHERE e1.emp_no IS NULL AND e2.emp_no IS NULL

/*table of employee numbers that need to be replaced - exist in both tables
these numbers will remain in table emp_one and will be changed in table emp_two*/
CREATE TABLE #duplicates (dupid INT IDENTITY, emp_no VARCHAR(4))
INSERT INTO #duplicates
SELECT e2.emp_no
FROM #emp_two e2
JOIN #emp_one e1 ON e1.emp_no=e2.emp_no

/*write the new employee number*/
UPDATE e2
SET new_emp_no=ISNULL(f.emp_no,e2.emp_no)
FROM #emp_two e2
LEFT JOIN #duplicates d ON d.emp_no = e2.emp_no
LEFT JOIN #emp_free_no f ON f.id=d.dupid

/*show result*/
SELECT * FROM #emp_two

You can now move the rows from table emp_two to table emp_one, and use new_emp_no as key when inserting into table emp_one.



Post #506434
Posted Monday, May 26, 2008 5:55 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Tuesday, July 5, 2011 7:37 PM
Points: 58, Visits: 149
Vladan,

Thanks for taking the time to create a different approach. I will research the "Number" table solution, as I am finding that I am using something similar in my latest scripts.

I also want to point out something very clever in your loading script:

SELECT RIGHT('000'+ CAST(number as VARCHAR(4)),4) as emp_no...


One reason I was using Excel to generate my set of test records was that I could create a formula to preface the 000's on the keys. I will be using the statement you have above from here on out to do the same thing!

You noted that there is no mention of how to apply the new keys. You address this in your summary of inserting into a new table and using the new employee id from table two, which is a good solution. In my case, I was taking the new unused ids and using them in a conversion program written in Unibasic on a Unidata database. The Unidata tables were the source tables, and then DTS was bringing a copy of the tables to SQL Server, so the data had to be corrected in the underlying Unidata tables, but it was easier to identify the unused keys in SQL Server.




Post #506463
Posted Monday, May 26, 2008 10:17 AM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Wednesday, February 5, 2014 8:30 AM
Points: 769, Visits: 191
I have to agree with Vladan on this one.

The approach pointed out in the article is rather cumbersome, even for a one time thing.

Whenever one needs to compare data between tables, the first thing that should come to mind is some form of join between the tables in question.

metawizard2 nice reply to Vladan, it's always nice to see people who can take criticism and learn and grow from it.
Post #506527
Posted Monday, May 26, 2008 10:41 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Tuesday, July 5, 2011 7:37 PM
Points: 58, Visits: 149
I think that this illustrates one of the fundamental problems that procedural / OO programmers have working with SQL: thinking in sets is not intuitive when you are used to thinking in rows.

The approach in the article comes from a very "Row" place ( or as Jeff Moden refers to as "RBAR" - row by agonizing row http://www.sqlservercentral.com/articles/TSQL/62867/ ).

After playing around with Vladan's solution, I can see how it is a set solution, but I am still having problems "thinking in sets".

Does anyone have any advice on making that transition to sets?



Post #506536
Posted Monday, May 26, 2008 11:33 AM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Wednesday, February 5, 2014 8:30 AM
Points: 769, Visits: 191
I am primarily a developer but love playing in SQL as well, so I can associate to some degree with you.

For me the more I played in Query Analyzer trying to consolidate my code the more I began thinking in set based terms. I love less code, the more consolidated it is the more I like it. Most of the time consolidating the code makes it more set based ( not all the time though ).

Subscribing to newletters like SSC have helped to immerse myself in the technology. I am still very guilty of making mistakes and writing poor SQL code, but I get better with each mistake.

Learn to write selects, the more complicated the better, that taught me most of the syntax and method of thinking that I needed. Then later when I needed to manipulate the data further I was able to take my basic selects and tweak them to perform updates, deletes, and anything else I needed.
Post #506551
Posted Monday, May 26, 2008 4:48 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:00 AM
Points: 35,547, Visits: 32,137
Good approach, I think... my question would be, how do you find and change all the references to such a table? Now THAT would make for an interesting article. :)

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #506602
Posted Monday, May 26, 2008 6:59 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Tuesday, July 5, 2011 7:37 PM
Points: 58, Visits: 149
It has been a long journey who is nearing its end. I will tell the tale in an upcoming article! ;)


Post #506628
Posted Monday, May 26, 2008 11:01 PM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Thursday, October 16, 2014 4:46 AM
Points: 5,439, Visits: 1,400
Kudos to the author. First because the author found a solution to a problem. See to get a solution sometimes is a problem, but here the author got a solution which may not be the best. Now the author can do research to find a more smarter way of doing it.

Second the author shared with us the solution.
Third the author is very sporting to accept criticism.

Thanks......:)



Post #506663
Posted Tuesday, May 27, 2008 1:19 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Thursday, May 5, 2011 1:38 AM
Points: 1,636, Visits: 604
metawizard2,
now that you explained your background, I can understand better why you chose the way you described... and I'm glad you started to explore the set-based approach. If you'll work with SQL Server, you will find it opens lots of possibilities how to manage data efficiently. As long as you work with a few thousand rows, the difference is not crucial - but once you have tables with hundreds of thousands or millions of rows, row-by-row processing is absolute disaster. Converting a procedure to set-based logic can cut the execution time to a small fraction of the original time (like from several hours to a few minutes).
Good luck in your learning process, I'm sure you'll find almost everything you need on this site - and if you miss or don't understand something, just post your question and we will be happy to assist you! Looking forward to your articles and posts,
Vladan



Post #506716
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse