Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Delete Duplicate values


Delete Duplicate values

Author
Message
abmore
abmore
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1171 Visits: 931
Comments posted to this topic are about the item Delete Duplicate values
IN_Sandeep
IN_Sandeep
Old Hand
Old Hand (383 reputation)Old Hand (383 reputation)Old Hand (383 reputation)Old Hand (383 reputation)Old Hand (383 reputation)Old Hand (383 reputation)Old Hand (383 reputation)Old Hand (383 reputation)

Group: General Forum Members
Points: 383 Visits: 1255
Abhijit,

just give a short explanation of your topic, so it will be more clear to all.

Cheers!

Sandy.

--
Kenney Hill
Kenney Hill
Mr or Mrs. 500
Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)

Group: General Forum Members
Points: 526 Visits: 233
I agree that you need to write something to go with your script.

And I believe that your script is slightly flawed, it looks like it will only delete the first duplicate record of each duplicate set and will leave all the rest intact (your test data only had 2 duplicates in any set, as far as I noticed, so it worked fine for that data set). I would suggest making the following minor change, which would keep the first record of each duplicate set and delete all others (the first one in theory anyway, SQL gives no guarantee on order of duplicate records, and how would you know anyway? Wink ).

DELETE FROM cteDV WHERE RID <> 1

You could also saved a keystroke and said '> 1', but both will do the same thing in this case.

For a quick demo using asterisk in the select statement of your CTE works but I would suggest spelling out the columns so that the code doesn't break should the source table structure change.

Over all a nice little script for getting rid of annoying duplicate records, thank you for sharing it with us.
Abhijit More
Abhijit More
SSC Eights!
SSC Eights! (951 reputation)SSC Eights! (951 reputation)SSC Eights! (951 reputation)SSC Eights! (951 reputation)SSC Eights! (951 reputation)SSC Eights! (951 reputation)SSC Eights! (951 reputation)SSC Eights! (951 reputation)

Group: General Forum Members
Points: 951 Visits: 767
Kenney,

I appreciate your suggestion. thank you Smile.

Abhijit - http://abhijitmore.wordpress.com
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45273 Visits: 39934
Just to give physical representation of Kenny's suggestions...

--===== Create a temporary demonstration table and populate it with duplicated info
-- This is not part of the solution... it's just creating a demo.
-- This DROP is just to support reruns for test purposes
IF OBJECT_ID('tempdb..#DuplicateVals') IS NOT NULL
DROP TABLE #DuplicateVals

CREATE TABLE #DuplicateVals
(
Col1 VARCHAR(3),
Col2 VARCHAR(3),
Col3 VARCHAR(3)
)

INSERT INTO #DuplicateVals
(Col1,Col2,Col3)
SELECT 'aa1','aa1','aa1' UNION ALL --More than 1 duplicate from here...
SELECT 'aa1','aa1','aa1' UNION ALL
SELECT 'aa1','aa1','aa1' UNION ALL
SELECT 'aa1','aa1','aa1' UNION ALL
SELECT 'aa1','aa1','aa1' UNION ALL
SELECT 'aa1','aa1','aa1' UNION ALL --... to here
SELECT 'aa2','aa2','aa2' UNION ALL
SELECT 'aa2','aa2','aa2' UNION ALL
SELECT 'aa3','aa3','aa3' UNION ALL --No duplicates. Will it survive?
SELECT 'aa4','aa4','aa4' UNION ALL
SELECT 'aa4','aa4','aa4' UNION ALL
SELECT 'aa5','aa5','aa5' UNION ALL --No duplicates. Will it survive?
SELECT 'aa6','aa6','aa6' UNION ALL --More than 1 duplicate from here...
SELECT 'aa6','aa6','aa6' UNION ALL
SELECT 'aa6','aa6','aa6' --... to here

--===== Sanity check: Display the contents of the test table BEFORE the deletion
SELECT * FROM #DuplicateVals

--===== Delete duplicates without any temporal information present.
-- Note that there isn't actually any reason to return any columns from the table in the CTE...
-- Just the RowNum will suffice.
;WITH cteDV AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3 ORDER BY Col1, Col2, Col3 ) AS RowNum
FROM #DuplicateVals
)
DELETE FROM cteDV WHERE RowNum > 1

--===== Sanity check: Display the contents of the test table AFTER the deletion
SELECT * FROM #DuplicateVals



... the details are in the code but I wanted to point out that you don't really need to return any of the table's columns in the CTE for this method to work... just the RowNum.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
kwiggans
kwiggans
SSC-Enthusiastic
SSC-Enthusiastic (110 reputation)SSC-Enthusiastic (110 reputation)SSC-Enthusiastic (110 reputation)SSC-Enthusiastic (110 reputation)SSC-Enthusiastic (110 reputation)SSC-Enthusiastic (110 reputation)SSC-Enthusiastic (110 reputation)SSC-Enthusiastic (110 reputation)

Group: General Forum Members
Points: 110 Visits: 229
Great comments everyone.
This code introduced me to the WITH statement. perhaps I had seen it before, but I apparently put it out of my mind.
From what I've read, this common table expression acts like a view inside a query.
I've never been able to get this straight, but since it acts like a view, is that why deleting items from the cteDV will delete items from the base temp table, or is this something specific to the WITH statement? Unsure
I got my information from the following site: http://www.valentina-db.com/dokuwiki/doku.php?id=paradigma:public:en:releases:2.5:statement_with_option

Keith Wiggans

Carla Wilson-484785
Carla Wilson-484785
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1715 Visits: 1949
Very clever! I never thought to delete from a CTE!
changbluesky
changbluesky
SSC Rookie
SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)

Group: General Forum Members
Points: 43 Visits: 150
Vert Cool~~
But I still cannot understand why deleting items from the cteDV will delete items from the base temporary table, or is this something specific to the CTE statemen.
So amazing and so confusing.

Can anyone give me the answer? Many thanks
Kenney Hill
Kenney Hill
Mr or Mrs. 500
Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)Mr or Mrs. 500 (526 reputation)

Group: General Forum Members
Points: 526 Visits: 233
Cade.Bu (10/10/2008)
Vert Cool~~
But I still cannot understand why deleting items from the cteDV will delete items from the base temporary table, or is this something specific to the CTE statement.
So amazing and so confusing.

Can anyone give me the answer? Many thanks


Cade.Bu,

It's like kwiggans said, a CTE is like a View:

kwiggans (10/10/2008)
Great comments everyone.
This code introduced me to the WITH statement. perhaps I had seen it before, but I apparently put it out of my mind.
From what I've read, this common table expression acts like a view inside a query.
I've never been able to get this straight, but since it acts like a view, is that why deleting items from the cteDV will delete items from the base temp table, or is this something specific to the WITH statement? Unsure


Since I'm different from most (or is that contrary? Hehe ), I like to think of CTEs as derived tables that I can alias then refer to as many times in my query as I want simply by using the alias. Actually, an argument can be made that a View is simply a derived table that you can access with out defining it for each query, you only define it the one time. But I digress, back to the issue.

So, a CTE works pretty much like a local view, and follows the same rules as a view in order for it to be updateable. Which means, an updateable CTE can be used just like an updateable view, you can perform Select, Insert or Delete queries on them.

Kwiggans,

The With statement is just the mechanism used to define the CTE. It's the fact that the CTE acts like a View that allows you to delete records through it.

I hope I helped more than I befuddled,
Kenney


Alternate Method:
This example can be done without the use of a CTE as well. All you have to do is to replace the CTE code and the delete statement with this piece of code that uses a derived table in a Delete statement to achieve the same goal:


DELETE NumberedDups
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3 ORDER BY Col1, Col2, Col3 ) AS RowNum
FROM #DuplicateVals) AS NumberedDups
WHERE RowNum > 1



Performance Note:
An article I've read stated and showed that a CTE is faster than using a temp table, and I've found this to be true up to a point. The temp table is slower mostly due to the hard drive access need to create it in the first place, but this is just a onetime hit. The problem with the CTE is that SQL runs it each time you reference it in your query, so if your CTE takes 3 seconds to run and you use it 5 times in your query then you're looking at up to 15 seconds (or less due to caching) just for the CTE to be ran all 5 times and then you still have to add the time it takes for the rest of your query to run. But if you used a temp table then after the first hit of creating the temp table you start gaining time because it takes a lot less time to retrieve the records from a temp table than it does to reproduce the result set each time VIA a CTE (or any other query mechanism).

What does this mean? It means that if your CTEs are slow you will want to think twice about using them if you need to reference them multiple times in the same query batch.


Extra Study Idea:
You might want to look into a cool feature of CTEs. They can refer to themselves from within their own With statement (this is a recursive CTE). Books on Line has a little piece on this functionality where they show you how to get a list of someone(s) supervisor and their supervisor and their supervisor, etc. I think it's well worth a look since it opens up a path to do complicated things in a less complicated way.
changbluesky
changbluesky
SSC Rookie
SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)

Group: General Forum Members
Points: 43 Visits: 150
Hi Kenney,

I see..Thank you very much for your explanation.Smile

Cade
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search