Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Delete Duplicate values Expand / Collapse
Author
Message
Posted Tuesday, July 22, 2008 11:08 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Yesterday @ 1:55 AM
Points: 1,145, Visits: 846
Comments posted to this topic are about the item Delete Duplicate values
Post #538957
Posted Tuesday, July 22, 2008 11:13 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, March 07, 2011 3:15 AM
Points: 375, Visits: 1,255
Abhijit,

just give a short explanation of your topic, so it will be more clear to all.

Cheers!

Sandy.


--
Post #538967
Posted Thursday, October 09, 2008 11:03 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 11:37 AM
Points: 313, Visits: 128
I agree that you need to write something to go with your script.

And I believe that your script is slightly flawed, it looks like it will only delete the first duplicate record of each duplicate set and will leave all the rest intact (your test data only had 2 duplicates in any set, as far as I noticed, so it worked fine for that data set). I would suggest making the following minor change, which would keep the first record of each duplicate set and delete all others (the first one in theory anyway, SQL gives no guarantee on order of duplicate records, and how would you know anyway? ;) ).

DELETE FROM cteDV WHERE RID <> 1

You could also saved a keystroke and said '> 1', but both will do the same thing in this case.

For a quick demo using asterisk in the select statement of your CTE works but I would suggest spelling out the columns so that the code doesn't break should the source table structure change.

Over all a nice little script for getting rid of annoying duplicate records, thank you for sharing it with us.
Post #583741
Posted Friday, October 10, 2008 2:53 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 1:55 AM
Points: 911, Visits: 682
Kenney,

I appreciate your suggestion. thank you :).


Abhijit - http://abhijitmore.wordpress.com
Post #583820
Posted Friday, October 10, 2008 5:54 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:25 AM
Points: 35,959, Visits: 30,253
Just to give physical representation of Kenny's suggestions...

--===== Create a temporary demonstration table and populate it with duplicated info
-- This is not part of the solution... it's just creating a demo.
-- This DROP is just to support reruns for test purposes
IF OBJECT_ID('tempdb..#DuplicateVals') IS NOT NULL
DROP TABLE #DuplicateVals

CREATE TABLE #DuplicateVals
(
Col1 VARCHAR(3),
Col2 VARCHAR(3),
Col3 VARCHAR(3)
)

INSERT INTO #DuplicateVals
(Col1,Col2,Col3)
SELECT 'aa1','aa1','aa1' UNION ALL --More than 1 duplicate from here...
SELECT 'aa1','aa1','aa1' UNION ALL
SELECT 'aa1','aa1','aa1' UNION ALL
SELECT 'aa1','aa1','aa1' UNION ALL
SELECT 'aa1','aa1','aa1' UNION ALL
SELECT 'aa1','aa1','aa1' UNION ALL --... to here
SELECT 'aa2','aa2','aa2' UNION ALL
SELECT 'aa2','aa2','aa2' UNION ALL
SELECT 'aa3','aa3','aa3' UNION ALL --No duplicates. Will it survive?
SELECT 'aa4','aa4','aa4' UNION ALL
SELECT 'aa4','aa4','aa4' UNION ALL
SELECT 'aa5','aa5','aa5' UNION ALL --No duplicates. Will it survive?
SELECT 'aa6','aa6','aa6' UNION ALL --More than 1 duplicate from here...
SELECT 'aa6','aa6','aa6' UNION ALL
SELECT 'aa6','aa6','aa6' --... to here

--===== Sanity check: Display the contents of the test table BEFORE the deletion
SELECT * FROM #DuplicateVals

--===== Delete duplicates without any temporal information present.
-- Note that there isn't actually any reason to return any columns from the table in the CTE...
-- Just the RowNum will suffice.
;WITH cteDV AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3 ORDER BY Col1, Col2, Col3 ) AS RowNum
FROM #DuplicateVals
)
DELETE FROM cteDV WHERE RowNum > 1

--===== Sanity check: Display the contents of the test table AFTER the deletion
SELECT * FROM #DuplicateVals

... the details are in the code but I wanted to point out that you don't really need to return any of the table's columns in the CTE for this method to work... just the RowNum.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #583886
Posted Friday, October 10, 2008 6:36 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, November 25, 2013 5:33 PM
Points: 107, Visits: 207
Great comments everyone.
This code introduced me to the WITH statement. perhaps I had seen it before, but I apparently put it out of my mind.
From what I've read, this common table expression acts like a view inside a query.
I've never been able to get this straight, but since it acts like a view, is that why deleting items from the cteDV will delete items from the base temp table, or is this something specific to the WITH statement?
I got my information from the following site: http://www.valentina-db.com/dokuwiki/doku.php?id=paradigma:public:en:releases:2.5:statement_with_option


Keith Wiggans
Post #583909
Posted Friday, October 10, 2008 7:37 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, April 15, 2014 3:30 PM
Points: 1,525, Visits: 1,833
Very clever! I never thought to delete from a CTE!
Post #583942
Posted Friday, October 10, 2008 8:10 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, November 01, 2010 4:19 AM
Points: 39, Visits: 150
Vert Cool~~
But I still cannot understand why deleting items from the cteDV will delete items from the base temporary table, or is this something specific to the CTE statemen.
So amazing and so confusing.

Can anyone give me the answer? Many thanks
Post #584331
Posted Friday, October 10, 2008 11:57 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 11:37 AM
Points: 313, Visits: 128
Cade.Bu (10/10/2008)
Vert Cool~~
But I still cannot understand why deleting items from the cteDV will delete items from the base temporary table, or is this something specific to the CTE statement.
So amazing and so confusing.

Can anyone give me the answer? Many thanks


Cade.Bu,

It's like kwiggans said, a CTE is like a View:

kwiggans (10/10/2008)
Great comments everyone.
This code introduced me to the WITH statement. perhaps I had seen it before, but I apparently put it out of my mind.
From what I've read, this common table expression acts like a view inside a query.
I've never been able to get this straight, but since it acts like a view, is that why deleting items from the cteDV will delete items from the base temp table, or is this something specific to the WITH statement?


Since I'm different from most (or is that contrary? ), I like to think of CTEs as derived tables that I can alias then refer to as many times in my query as I want simply by using the alias. Actually, an argument can be made that a View is simply a derived table that you can access with out defining it for each query, you only define it the one time. But I digress, back to the issue.

So, a CTE works pretty much like a local view, and follows the same rules as a view in order for it to be updateable. Which means, an updateable CTE can be used just like an updateable view, you can perform Select, Insert or Delete queries on them.

Kwiggans,

The With statement is just the mechanism used to define the CTE. It's the fact that the CTE acts like a View that allows you to delete records through it.

I hope I helped more than I befuddled,
Kenney


Alternate Method:
This example can be done without the use of a CTE as well. All you have to do is to replace the CTE code and the delete statement with this piece of code that uses a derived table in a Delete statement to achieve the same goal:

     DELETE NumberedDups
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3 ORDER BY Col1, Col2, Col3 ) AS RowNum
FROM #DuplicateVals) AS NumberedDups
WHERE RowNum > 1

Performance Note:
An article I've read stated and showed that a CTE is faster than using a temp table, and I've found this to be true up to a point. The temp table is slower mostly due to the hard drive access need to create it in the first place, but this is just a onetime hit. The problem with the CTE is that SQL runs it each time you reference it in your query, so if your CTE takes 3 seconds to run and you use it 5 times in your query then you're looking at up to 15 seconds (or less due to caching) just for the CTE to be ran all 5 times and then you still have to add the time it takes for the rest of your query to run. But if you used a temp table then after the first hit of creating the temp table you start gaining time because it takes a lot less time to retrieve the records from a temp table than it does to reproduce the result set each time VIA a CTE (or any other query mechanism).

What does this mean? It means that if your CTEs are slow you will want to think twice about using them if you need to reference them multiple times in the same query batch.


Extra Study Idea:
You might want to look into a cool feature of CTEs. They can refer to themselves from within their own With statement (this is a recursive CTE). Books on Line has a little piece on this functionality where they show you how to get a list of someone(s) supervisor and their supervisor and their supervisor, etc. I think it's well worth a look since it opens up a path to do complicated things in a less complicated way.
Post #584344
Posted Saturday, October 11, 2008 1:38 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, November 01, 2010 4:19 AM
Points: 39, Visits: 150
Hi Kenney,

I see..Thank you very much for your explanation.:)

Cade
Post #584351
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse