Just For Fun: An Impossible Delete

  • Jeff Moden (8/4/2008)


    What an amazing article! Covers most everything from womb-to-tomb for such an "impossible" delete. I remember being amazed when Barry first posted his solution on the original thread. I'm even more amazed that he wove in some of the techniques used in the old days and took the time to explain how each byte of memory was cherished which made these types of methods absolutely necessary.

    Well done, Barry!

    Thanks, Jeff. High praise from the master himself. :blush:

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Emre Erkan (8/5/2008)


    Nice Article. I learned a lot. But for this problem there may be a shorter/ simpler solution...

    In all fairness, I should point out that sometime after I posted my solution, Chris Morris posted what I believe to be the optimal solution. It is far shorter and simpler than mine, with none of my convoluted 30 year-old manipulations: http://www.sqlservercentral.com/Forums/FindPost526653.aspx. Of course, I couldn't very well write an articale about that!

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Yes. Shaun

    The Sybase and SQL Server Update command allows the use of variables in the SET statement. In the case of SQL Server 2005, this is....

    SET

    { column_name = { expression | DEFAULT | NULL }

    | { udt_column_name.{ { property_name = expression

    | field_name = expression }

    | method_name ( argument [ ,...n ] )

    }

    }

    | column_name { .WRITE ( expression , @Offset , @Length ) }

    | @variable = expression

    | @variable = column = expression [ ,...n ]

    } [ ,...n ]

    All I'm doing is talking advantage of the order of update in order to flag each row that is different from the one before (the table must be in the physical order of the columns that you are comparing). I'm just writing a 'd' (meaning delete) in every row that is different to the one before and then deleting each row that has that 'd'. I used the hash just to make the code more compact, but the logic should have been expanded to make it more readable. I should have saved each significant column to a separate variable, and compared them separately with an AND.

    Note a couple of things.

    Updates seem to be done in the order of the clustered index. In the example, it was done in the order we inserted them into the table variable. with each row, all updates that assign to a variable are done first, in order left to right, followed by the column assignments, left to right. You have to be careful, though as there is no definition of this behaviour in BOL as far as I can find, but it seems to work in all versions.

    It is very fast. I am confident I can win the speed race against the other solutions as long as the table is in the correct physical order!

    Best wishes,
    Phil Factor

  • Chirag (8/5/2008)


    great article!!

    Thanks for the Feedback, Chriag!

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Oops

    I've just seen Chris Morris solution in the original forum, which is pretty much the same as mine. He got there first. I'll shut up!

    Best wishes,
    Phil Factor

  • Atif Sheikh (8/5/2008)


    GREAT ARTICLE...

    Alot of learning points in it...

    Thanks, Atif.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks Phil

    love the eBook and the SimpleTalk articles, always a good read.:)

    The one about the ex-riot policemane turned DBA should have been quoted on the Stored Procs vs ORM thread. 😀

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • Tao Klerks (8/5/2008)


    Hmm, my curiousity is piqued... The solution presented is quite interesting, but I believe the purported interest in bits and bytes is a little exaggerated, as the solution is still quite expensive (rewrites an entire column in a single update statement, and rewrites/deletes the bulk of the data if there are many duplicates - needing up to almost double the original storage space to complete).

    No argument, Tao. Understand that up until that point, most of us, including myself at first, felt that it was not even possible given the restrictions. My solution is certainly not the most efficient, just the first one to "crack the nut" so to speak.

    So I'm quite curious:

    What would the most efficient way of addressing this problem be? (using least memory and / or using least I/O, especially on a very large data set; also consider whether the data can still be accessed during the update process)

    Great question. My nomination would be for Chris Morris's entry, here: http://www.sqlservercentral.com/Forums/FindPost526653.aspx

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Samuel Vella (8/5/2008)


    I think this calls for Monty Pythons The Four Yorkshiremen skit:

    http://www.phespirit.info/montypython/four_yorkshiremen.htm

    Heh. Thanks, Samuel, that's one of my favorite. Some of the other fossils and I do that bit for the "kids" we work with every now and then:

    "You had bytes? You were lucky! It was a good day for us if we even smelled a bit!" 😛

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Brilliant article! Reminds me somehow of one of my colleagues who wrote the following in C:

    int x = 1;

    int y = 2;

    x ^= y ^= x ^= y;

    just to swap two integers around without wasting another 16 bits (or was it even 8 back then) on the stack.

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

  • Phil Factor (8/5/2008)


    Surely this is the simplest solution? ...

    Pretty good, Phil. I think that that is almost the same as Chris Morris's.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • OK, so now I'm in awe of Chris Morris too... 🙂

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • JohnG (8/5/2008)


    This was a great article and I really enjoyed reading it. I'm particularly grateful for the excellent explanatory text which meant I could actually follow most of what was going on despite being fairly new to SQL.

    At the risk of being overly picky there is one bit I have to query:

    "Likewise, the F's are stored as "01000110" in upper-case and "01000110" in lower (if you look closely, you can tell which bit makes it upper or lower case)."

    Am I simply not looking closely enough or are these values actually identical? Sneaky typo perhaps?

    Ouch! Yeah, you got me. (Dang, how did that one slip past?) I beleive that the lower-case "f" should be binary "01100110".

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Jan Van der Eecken (8/5/2008)


    Brilliant article! Reminds me somehow of one of my colleagues who wrote the following in C:

    int x = 1;

    int y = 2;

    x ^= y ^= x ^= y;

    just to swap two integers around without wasting another 16 bits (or was it even 8 back then) on the stack.

    Thanks, Jan. Yeah, I know that old chestnut. In fact I've been trying to figure out how to work it into a SQL QotD for the last 6 weeks. And now you've gone and given it away! 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • A really simple option which is able to take advantage that "age" is a number...

    Create a table "test" with the following data as shown:

    select * from test

    NameAgeSex

    ABC 24M

    ABC 24M

    DEF 24M

    DEF 24M

    GHI 26F

    GHI 26F

    GHI 26F

    PQRS 25F

    insert test

    select distinct name, age*1000, sex

    from test

    delete from test where age<1000

    update test

    set age=age/1000

    select * from test

    NameAgeSex

    ABC 24M

    DEF 24M

    GHI 26F

    PQRS 25F

Viewing 15 posts - 16 through 30 (of 156 total)

You must be logged in to reply to this topic. Login to reply