December 15, 2009 at 9:03 am
Hi Guys, long time reader, first time poster
I have a question regarding duplicate entries.
I have a basic table with 7 columns that I am having to more into a new table with a unique index on 4 of the 7 columns.
I have around 50,000 rows that due to the new index are considered 'duplicate'.
My question is what would be the easiest way to either move the unique rows to a new table?
Thanks
Dean
December 15, 2009 at 9:26 am
This should get you started, Dean:
DROP TABLE #MySampleTable
CREATE TABLE #MySampleTable (col1 INT, col2 INT, col3 INT, col4 INT, col5 INT, col6 INT, col7 INT)
INSERT INTO #MySampleTable (col1, col2, col3, col4, col5, col6, col7)
SELECT 1, 2, 3, 4, 1, 1, 1 UNION ALL
SELECT 1, 2, 3, 4, 2, 2, 2 UNION ALL
SELECT 1, 2, 3, 4, 3, 3, 3 UNION ALL
SELECT 1, 2, 3, 5, 1, 1, 1 UNION ALL
SELECT 1, 2, 3, 5, 2, 2, 2 UNION ALL
SELECT 1, 2, 3, 6, 1, 1, 1 UNION ALL
SELECT 1, 2, 3, 6, 2, 2, 2 UNION ALL
SELECT 1, 2, 3, 7, 1, 1, 1
SELECT *
INTO #MyNewTable
FROM (
SELECT *, MyRow = ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY col1, col2, col3, col4, col5, col6, col7)
FROM #MySampleTable
) d
WHERE MyRow = 1
SELECT * FROM #MyNewTable
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 15, 2009 at 9:29 am
It would be most helpful for you to post :
1. Table(s) schema.
2. Sample data.
Thanks.
December 15, 2009 at 1:22 pm
Thanks for the replies guys.
I will get the schema and sample data tomorrow and add to this forum.
Cheers.
Dean
December 15, 2009 at 1:39 pm
Something for you to think about.
If two rows have duplicate values for those 4 columns, but different values for the other three, which row would you want to keep and which would you want to discard?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 15, 2009 at 1:54 pm
Thats one of the issues
One of the columns is a time stamp so I would want to keep the latest entry.
Basically this is a WorkflowStatus table that has status code, then four column relating to the data itself and finally a 'Time' and a 'UpdateBy' column which records the users name.
Dean.
December 15, 2009 at 2:18 pm
deanrjohnson (12/15/2009)
Thats one of the issues
Actually doing it is not hard, if there's a simple rule as to which row to keep. I'm sure once the schema's posted there'll be a solution within the hour.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 7 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy