Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Just For Fun: An Impossible Delete Expand / Collapse
Author
Message
Posted Tuesday, August 05, 2008 3:13 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, April 15, 2014 11:04 AM
Points: 1,384, Visits: 1,240
wow, I am in awe (assuming it works :) ). I would venture a guess that this is, indeed, the most efficient solution.

One fix required here is to separate the identifier from the name in the hash, otherwise "MyName1" with identifier 6 could be confused with "MyName" with identifier 16.



http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
Post #546586
Posted Tuesday, August 05, 2008 3:21 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 8:40 AM
Points: 561, Visits: 2,418
Aiee! Yes, I tapped it in a bit of a hurry. Any delimiter between the values will help the hash. My only worry is that it found one more row to delete than the original problem-setter said there would be, but it seems to be a genuine candidate for deletion.

Also it sometimes needs an extra '' in the expresssion to get it to work in SQL Server 2000, due to a bug in SQL Server!

If you turn the logic round, it selects all the unique rows without a group-by. I wonder if it is quicker than the DISTINCT or GROUP BY?




Best wishes,

Phil Factor
Simple Talk
Post #546590
Posted Tuesday, August 05, 2008 4:55 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, May 05, 2009 3:06 PM
Points: 8, Visits: 87
This was a great article and I really enjoyed reading it. I'm particularly grateful for the excellent explanatory text which meant I could actually follow most of what was going on despite being fairly new to SQL.

At the risk of being overly picky there is one bit I have to query:

"Likewise, the F's are stored as "01000110" in upper-case and "01000110" in lower (if you look closely, you can tell which bit makes it upper or lower case)."

Am I simply not looking closely enough or are these values actually identical? Sneaky typo perhaps?

Anyway thanks for the great article.

All the best,
John
Post #546623
Posted Tuesday, August 05, 2008 5:20 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Thursday, October 04, 2012 9:20 AM
Points: 583, Visits: 1,060
Been waiting for it....:D

Phil could you give a more verbose explanation of your code please?



Hiding under a desk from SSIS Implemenation Work
Post #546631
Posted Tuesday, August 05, 2008 5:37 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:11 AM
Points: 36,015, Visits: 30,302
Tao Klerks (8/5/2008)
Hmm, my curiousity is piqued... The solution presented is quite interesting, but I believe the purported interest in bits and bytes is a little exaggerated, as the solution is still quite expensive (rewrites an entire column in a single update statement, and rewrites/deletes the bulk of the data if there are many duplicates - needing up to almost double the original storage space to complete). So I'm quite curious:

What would the most efficient way of addressing this problem be? (using least memory and / or using least I/O, especially on a very large data set; also consider whether the data can still be accessed during the update process)

This should be quite easy to test, given a few hours to create a large (GB-sized?) data set and try out a few different approaches...

It seems a problem that many more people are likely to face nowadays - powerful hardware, very robust systems that can remain online as you do things like add columns, but extremely large amounts of data (many gigabytes) and no maintenance time / downtime available!


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #546636
Posted Tuesday, August 05, 2008 5:40 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Friday, March 28, 2014 2:25 PM
Points: 9,902, Visits: 9,479
Jeff Moden (8/4/2008)
What an amazing article! Covers most everything from womb-to-tomb for such an "impossible" delete. I remember being amazed when Barry first posted his solution on the original thread. I'm even more amazed that he wove in some of the techniques used in the old days and took the time to explain how each byte of memory was cherished which made these types of methods absolutely necessary.

Well done, Barry!

Thanks, Jeff. High praise from the master himself.


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #546637
Posted Tuesday, August 05, 2008 5:44 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Friday, March 28, 2014 2:25 PM
Points: 9,902, Visits: 9,479
Emre Erkan (8/5/2008)
Nice Article. I learned a lot. But for this problem there may be a shorter/ simpler solution...


In all fairness, I should point out that sometime after I posted my solution, Chris Morris posted what I believe to be the optimal solution. It is far shorter and simpler than mine, with none of my convoluted 30 year-old manipulations: http://www.sqlservercentral.com/Forums/FindPost526653.aspx. Of course, I couldn't very well write an articale about that!


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #546639
Posted Tuesday, August 05, 2008 5:45 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 8:40 AM
Points: 561, Visits: 2,418
Yes. Shaun

The Sybase and SQL Server Update command allows the use of variables in the SET statement. In the case of SQL Server 2005, this is....
SET
{ column_name = { expression | DEFAULT | NULL }
| { udt_column_name.{ { property_name = expression
| field_name = expression }
| method_name ( argument [ ,...n ] )
}
}
| column_name { .WRITE ( expression , @Offset , @Length ) }
| @variable = expression
| @variable = column = expression [ ,...n ]
} [ ,...n ]

All I'm doing is talking advantage of the order of update in order to flag each row that is different from the one before (the table must be in the physical order of the columns that you are comparing). I'm just writing a 'd' (meaning delete) in every row that is different to the one before and then deleting each row that has that 'd'. I used the hash just to make the code more compact, but the logic should have been expanded to make it more readable. I should have saved each significant column to a separate variable, and compared them separately with an AND.

Note a couple of things.

Updates seem to be done in the order of the clustered index. In the example, it was done in the order we inserted them into the table variable. with each row, all updates that assign to a variable are done first, in order left to right, followed by the column assignments, left to right. You have to be careful, though as there is no definition of this behaviour in BOL as far as I can find, but it seems to work in all versions.

It is very fast. I am confident I can win the speed race against the other solutions as long as the table is in the correct physical order!



Best wishes,

Phil Factor
Simple Talk
Post #546640
Posted Tuesday, August 05, 2008 5:45 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Friday, March 28, 2014 2:25 PM
Points: 9,902, Visits: 9,479
Chirag (8/5/2008)
great article!!

Thanks for the Feedback, Chriag!


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #546641
Posted Tuesday, August 05, 2008 5:49 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 8:40 AM
Points: 561, Visits: 2,418
Oops
I've just seen Chris Morris solution in the original forum, which is pretty much the same as mine. He got there first. I'll shut up!



Best wishes,

Phil Factor
Simple Talk
Post #546643
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse