Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Just For Fun: An Impossible Delete


Just For Fun: An Impossible Delete

Author
Message
Tao Klerks
Tao Klerks
Ten Centuries
Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)

Group: General Forum Members
Points: 1425 Visits: 1249
wow, I am in awe (assuming it works Smile ). I would venture a guess that this is, indeed, the most efficient solution.

One fix required here is to separate the identifier from the name in the hash, otherwise "MyName1" with identifier 6 could be confused with "MyName" with identifier 16.

http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
Phil Factor
Phil Factor
SSC Eights!
SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)

Group: General Forum Members
Points: 927 Visits: 2953
Aiee! Yes, I tapped it in a bit of a hurry. Any delimiter between the values will help the hash. My only worry is that it found one more row to delete than the original problem-setter said there would be, but it seems to be a genuine candidate for deletion.

Also it sometimes needs an extra '' in the expresssion to get it to work in SQL Server 2000, due to a bug in SQL Server!

If you turn the logic round, it selects all the unique rows without a group-by. I wonder if it is quicker than the DISTINCT or GROUP BY?


Best wishes,

Phil Factor
Simple Talk
JohnG-529958
JohnG-529958
Grasshopper
Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)

Group: General Forum Members
Points: 18 Visits: 87
This was a great article and I really enjoyed reading it. I'm particularly grateful for the excellent explanatory text which meant I could actually follow most of what was going on despite being fairly new to SQL.

At the risk of being overly picky there is one bit I have to query:

"Likewise, the F's are stored as "01000110" in upper-case and "01000110" in lower (if you look closely, you can tell which bit makes it upper or lower case)."

Am I simply not looking closely enough or are these values actually identical? Sneaky typo perhaps?

Anyway thanks for the great article.

All the best,
John
Shaun McGuile
Shaun McGuile
Mr or Mrs. 500
Mr or Mrs. 500 (595 reputation)Mr or Mrs. 500 (595 reputation)Mr or Mrs. 500 (595 reputation)Mr or Mrs. 500 (595 reputation)Mr or Mrs. 500 (595 reputation)Mr or Mrs. 500 (595 reputation)Mr or Mrs. 500 (595 reputation)Mr or Mrs. 500 (595 reputation)

Group: General Forum Members
Points: 595 Visits: 1060
Been waiting for it....BigGrin

Phil could you give a more verbose explanation of your code please?

Hiding under a desk from SSIS Implemenation Work Crazy
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)

Group: General Forum Members
Points: 51813 Visits: 40308
Tao Klerks (8/5/2008)
Hmm, my curiousity is piqued... The solution presented is quite interesting, but I believe the purported interest in bits and bytes is a little exaggerated, as the solution is still quite expensive (rewrites an entire column in a single update statement, and rewrites/deletes the bulk of the data if there are many duplicates - needing up to almost double the original storage space to complete). So I'm quite curious:

What would the most efficient way of addressing this problem be? (using least memory and / or using least I/O, especially on a very large data set; also consider whether the data can still be accessed during the update process)

This should be quite easy to test, given a few hours to create a large (GB-sized?) data set and try out a few different approaches...

It seems a problem that many more people are likely to face nowadays - powerful hardware, very robust systems that can remain online as you do things like add columns, but extremely large amounts of data (many gigabytes) and no maintenance time / downtime available!


--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
RBarryYoung
RBarryYoung
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10096 Visits: 9517
Jeff Moden (8/4/2008)
What an amazing article! Covers most everything from womb-to-tomb for such an "impossible" delete. I remember being amazed when Barry first posted his solution on the original thread. I'm even more amazed that he wove in some of the techniques used in the old days and took the time to explain how each byte of memory was cherished which made these types of methods absolutely necessary.

Well done, Barry!

Thanks, Jeff. High praise from the master himself. Blush

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
RBarryYoung
RBarryYoung
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10096 Visits: 9517
Emre Erkan (8/5/2008)
Nice Article. I learned a lot. But for this problem there may be a shorter/ simpler solution...


In all fairness, I should point out that sometime after I posted my solution, Chris Morris posted what I believe to be the optimal solution. It is far shorter and simpler than mine, with none of my convoluted 30 year-old manipulations: http://www.sqlservercentral.com/Forums/FindPost526653.aspx. Of course, I couldn't very well write an articale about that!

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Phil Factor
Phil Factor
SSC Eights!
SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)

Group: General Forum Members
Points: 927 Visits: 2953
Yes. Shaun

The Sybase and SQL Server Update command allows the use of variables in the SET statement. In the case of SQL Server 2005, this is....
SET
{ column_name = { expression | DEFAULT | NULL }
| { udt_column_name.{ { property_name = expression
| field_name = expression }
| method_name ( argument [ ,...n ] )
}
}
| column_name { .WRITE ( expression , @Offset , @Length ) }
| @variable = expression
| @variable = column = expression [ ,...n ]
} [ ,...n ]

All I'm doing is talking advantage of the order of update in order to flag each row that is different from the one before (the table must be in the physical order of the columns that you are comparing). I'm just writing a 'd' (meaning delete) in every row that is different to the one before and then deleting each row that has that 'd'. I used the hash just to make the code more compact, but the logic should have been expanded to make it more readable. I should have saved each significant column to a separate variable, and compared them separately with an AND.

Note a couple of things.

Updates seem to be done in the order of the clustered index. In the example, it was done in the order we inserted them into the table variable. with each row, all updates that assign to a variable are done first, in order left to right, followed by the column assignments, left to right. You have to be careful, though as there is no definition of this behaviour in BOL as far as I can find, but it seems to work in all versions.

It is very fast. I am confident I can win the speed race against the other solutions as long as the table is in the correct physical order!


Best wishes,

Phil Factor
Simple Talk
RBarryYoung
RBarryYoung
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10096 Visits: 9517
Chirag (8/5/2008)
great article!!

Thanks for the Feedback, Chriag!

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Phil Factor
Phil Factor
SSC Eights!
SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)SSC Eights! (927 reputation)

Group: General Forum Members
Points: 927 Visits: 2953
Oops
I've just seen Chris Morris solution in the original forum, which is pretty much the same as mine. He got there first. I'll shut up!


Best wishes,

Phil Factor
Simple Talk
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search