Just For Fun: An Impossible Delete

  • Me too. Fair enough.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • If one were to pose a question along the lines of the teacher's lesson from John's post people would still try to 'do the math, draw the picture and write the poem'.

    The point would be lost on certain people as it is already, they still would learn nothing.

    It is clear that a number of posters did not understand the question/article and jumped to a conclusion. How do you educate these people?

    Should they be ignored or told 'no you are wrong, please read the problem as posted fully and understand it before posting next time'?

    --Shaun

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • I agree they need to be told and I very much appreciate you going to bat on this subject... but folks, especially folks like myself, can and should do it without any caustic edge whatsoever. It's sometimes a hard thing to do especially when you see it day in and day out, both here and at work, but John is correct. If folks like me are gonna do things, we need to do them right. I've seen what happens on other forums when even tiny flames become the rule rather than the exception. I should have left it at "go back and read" instead of rubbing it in later.

    Barry was actually NOT out of line... I was the only one that used any inflammatory words or innuendo. He was absolutely correct that it would make for an interesting study.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • In the spur of the moment, we all occasionally forget that people who post articles, or contribute to forums, are our guests, and therefore deserve courtesy. At the same time, most of us enjoy the spontaneity of the replies, though it can mean the occasional flashes of exasperation, even when directed at us! It is just a matter of getting the balance right. I suspect it is better to err on the side of politeness. It is very easy to unintentionally hurt someone's feelings, and I'd hate to discourage contributions.

    Best wishes,
    Phil Factor

  • Well said. All o' youse guys. 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Phil Factor (8/5/2008)


    Surely this is the simplest solution? (be warned of a nasty catch when creating these 'quirky updates'. the order of execution of the update is -variable assignments first then column assignments-both left to right!)

    Very nice solution.

    I Just prefer little change, it's more readable

    [font="Courier New"]--create the sample table

    DECLARE @Sample TABLE (TheName VARCHAR(4),Identifier INT, sex VARCHAR(1))

    INSERT INTO @Sample (TheName,Identifier,Sex)

    SELECT 'ABC', 24, 'M' UNION ALL

    SELECT 'ABC', 24,'M' UNION ALL

    SELECT 'LMN', 27, 'M' UNION ALL

    SELECT 'LMN', 27, 'M' UNION ALL

    SELECT 'LMN', 27, 'M' UNION ALL

    SELECT 'PQRS', 25, 'F' UNION ALL

    SELECT 'XYZ', 24, 'M' UNION ALL

    SELECT 'XYZ', 25, 'M'

    DECLARE @hash VARCHAR(80), @sex VARCHAR(1)

    UPDATE @sample

    SET

    @sex=CASE WHEN COALESCE(@hash,'')

    <>TheName+CONVERT(VARCHAR(5),Identifier)+sex

    THEN 'd' ELSE SEX END,

    @hash= TheName+CONVERT(VARCHAR(5),Identifier)+sex,

    Sex = @sex

    DELETE FROM @sample WHERE sex='d'

    SELECT * FROM @sample[/font]



    See, understand, learn, try, use efficient
    © Dr.Plch

  • I haven't read through all 12 pages of post admittedly but don't think this has been suggested...

    The rules said no new columns but didn't say anything about altering existing columns, should you not have one with a datatype long enough to accept the following. Or of course you could concatenate it onto the int columns assuming it would not exceed int - the existing datatypes would drive whether you needed to alter the table. So worst case scenario:

    DROP TABLE Play

    CREATE TABLE Play(TheName VARCHAR(4),Identifier INT, sex VARCHAR(1))

    INSERT INTO Play (TheName,Identifier,Sex)

    SELECT 'ABC', 24, 'M' UNION ALL

    SELECT 'ABC', 24,'M' UNION ALL

    SELECT 'LMN', 27, 'M' UNION ALL

    SELECT 'LMN', 27, 'M' UNION ALL

    SELECT 'LMN', 27, 'M' UNION ALL

    SELECT 'PQRS', 25, 'F' UNION ALL

    SELECT 'XYZ', 24, 'M' UNION ALL

    SELECT 'XYZ', 25, 'M'

    ALTER TABLE Play

    ALTER COLUMN Sex varchar(20)

    UPDATE Play

    SET Sex = Sex + convert(char,abs(checksum(newid())))

    DELETE a

    FROM Play a

    WHERE substring(Sex,2,19) =

    (SELECT min(substring(Sex,2,19))

    FROM Play b

    WHERE a.TheName = b.TheName

    AND a.Identifier = b.Identifier

    AND left(a.sex,1) = left(b.sex,1))

    UPDATE Play

    SET Sex = left(Sex,1)

    ALTER TABLE Play

    ALTER COLUMN Sex varchar(1)

  • Good brain training but are the restrictions as mentioned realistic? Which company would definitely not want to use a temp table or table variable?

  • -- Create table

    CREATE TABLE dbo.xSource (Name VARCHAR(50), Age TINYINT, Sex CHAR(1))

    INSERT INTO xSource VALUES ('Stephen', 43, 'M')

    INSERT INTO xSource VALUES ('Stephen', 43, 'M')

    INSERT INTO xSource VALUES ('Stephen', 43, 'M')

    INSERT INTO xSource VALUES ('Aron', 18, 'M')

    INSERT INTO xSource VALUES ('Sharon', 38, 'F')

    INSERT INTO xSource VALUES ('Sharon', 38, 'F')

    INSERT INTO xSource VALUES ('Sharon', 38, 'F')

    INSERT INTO xSource VALUES ('Sharon', 38, 'F')

    INSERT INTO xSource VALUES ('Ira', 24, 'F')

    INSERT INTO xSource VALUES ('Joe', 49, 'M')

    INSERT INTO xSource VALUES ('Joe', 49, 'M')

    INSERT INTO xSource VALUES ('John', 30, 'F')

    INSERT INTO xSource VALUES ('John', 30, 'F')

    INSERT INTO xSource VALUES ('Eva', 30, 'F')

    INSERT INTO xSource VALUES ('GB', 44, 'M')

    INSERT INTO xSource VALUES ('GB', 44, 'M')

    SELECT Name, Age, Sex FROM xSource ORDER BY Name, Sex, Age

    -- Step 1 Get list of only DUPLICATE rows

    INSERT INTO xSource (Name, Age, Sex)

    SELECT '~'+Name, Age, Sex FROM xSource GROUP BY Name, Age, Sex Having COUNT(*) > 1

    -- Part 2 Delete DUPLICATE rows

    DELETE xSource

    FROM xSource DLT

    INNER JOIN (SELECT SUBSTRING(Name, 2, 49) AS Name, Age, Sex FROM xSource WHERE SUBSTRING(Name,1,1)='~') LST

    ON DLT.Name = LST.Name

    AND DLT.Age = LST.Age

    AND DLT.Sex = LST.SEX

    -- Step 3 Repopulate Distinct Purged rows

    INSERT INTO xSource (Name, Age, Sex)

    SELECT Name, Age, Sex

    FROM (SELECT SUBSTRING(Name, 2, 49) AS Name, Age, Sex FROM xSource WHERE SUBSTRING(Name,1,1)='~') LST

    -- Step 4 Cleanup

    DELETE xSource

    WHERE SUBSTRING(Name,1,1)='~'

    SELECT Name, Age, Sex FROM xSource ORDER BY Name, Sex, Age

  • Thanks for the memories. A time when storage and memory was rare and expensive and you had to really know how the guts of the operating system, disk system, and memory allocation worked. Some of the old techniques are still useful today.

    Great article and a great solution. 😎

  • Stephen.Richardson (10/30/2009)


    -- Create table

    CREATE TABLE dbo.xSource (Name VARCHAR(50), Age TINYINT, Sex CHAR(1))

    INSERT INTO xSource VALUES ('Stephen', 43, 'M')

    INSERT INTO xSource VALUES ('Stephen', 43, 'M')

    INSERT INTO xSource VALUES ('Stephen', 43, 'M')

    INSERT INTO xSource VALUES ('Aron', 18, 'M')

    INSERT INTO xSource VALUES ('Sharon', 38, 'F')

    INSERT INTO xSource VALUES ('Sharon', 38, 'F')

    INSERT INTO xSource VALUES ('Sharon', 38, 'F')

    INSERT INTO xSource VALUES ('Sharon', 38, 'F')

    INSERT INTO xSource VALUES ('Ira', 24, 'F')

    INSERT INTO xSource VALUES ('Joe', 49, 'M')

    INSERT INTO xSource VALUES ('Joe', 49, 'M')

    INSERT INTO xSource VALUES ('John', 30, 'F')

    INSERT INTO xSource VALUES ('John', 30, 'F')

    INSERT INTO xSource VALUES ('Eva', 30, 'F')

    INSERT INTO xSource VALUES ('GB', 44, 'M')

    INSERT INTO xSource VALUES ('GB', 44, 'M')

    SELECT Name, Age, Sex FROM xSource ORDER BY Name, Sex, Age

    -- Step 1 Get list of only DUPLICATE rows

    INSERT INTO xSource (Name, Age, Sex)

    SELECT '~'+Name, Age, Sex FROM xSource GROUP BY Name, Age, Sex Having COUNT(*) > 1

    -- Part 2 Delete DUPLICATE rows

    DELETE xSource

    FROM xSource DLT

    INNER JOIN (SELECT SUBSTRING(Name, 2, 49) AS Name, Age, Sex FROM xSource WHERE SUBSTRING(Name,1,1)='~') LST

    ON DLT.Name = LST.Name

    AND DLT.Age = LST.Age

    AND DLT.Sex = LST.SEX

    -- Step 3 Repopulate Distinct Purged rows

    INSERT INTO xSource (Name, Age, Sex)

    SELECT Name, Age, Sex

    FROM (SELECT SUBSTRING(Name, 2, 49) AS Name, Age, Sex FROM xSource WHERE SUBSTRING(Name,1,1)='~') LST

    -- Step 4 Cleanup

    DELETE xSource

    WHERE SUBSTRING(Name,1,1)='~'

    SELECT Name, Age, Sex FROM xSource ORDER BY Name, Sex, Age

    Question: Why, in step 3 and 4, do you insert and delete rather than perform an UPDATE xSource Name = SUBSTRING(Name,2,49) WHERE SUBSTRING(Name,1,1)='~'?

    Is doing steps 3 and 4 more efficient than the 'replace in place'?

    <Caveat: I haven't had my coffee yet, so my coding may be slightly off, but I hope my intent is clear>

  • Funny how this article just came out today... I was actually on a similar thread last week and someone came up with a great solution...

    My personal opinion the article is interesting but ... hmmm...

    How about these solutions from this thread...

    http://www.sqlservercentral.com/Forums/Topic793765-145-1.aspx

    There's an undocumented identity key for every row of any table that can be used... in one simple delete statement you can delete the duplicate rows...

    I was amazed...

    Check it out.

    Thanks

  • The orginal request was to purge the system of "duplicate" rows. Some rows are duplicated and some are not. Additionaly some names lie Terry may be either a male or female therefore you could have a 30 year old Terry male and female therefore not duplicate rows. If you do an update you change all rows matching criteria (I did not try update top 1). Personaly I like security, NO MISTAKES. In production the absolute worst word in the the human lanquage is "oooopppssss", or "I think".

  • jghali (10/30/2009)


    Funny how this article just came out today... I was actually on a similar thread last week and someone came up with a great solution...

    My personal opinion the article is interesting but ... hmmm...

    How about these solutions from this thread...

    http://www.sqlservercentral.com/Forums/Topic793765-145-1.aspx

    There's an undocumented identity key for every row of any table that can be used... in one simple delete statement you can delete the duplicate rows...

    I was amazed...

    Check it out.

    Thanks

    Please read the thread its not about deleting duplicates its about keeping them 😀

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • Your off-hand comment about naming the field "Gender" rather than "Sex" piqued my curiosity. Why? As I understand the words, sex means biological differences: chromosomes and sexual organs. Gender refers to the characteristics a society or culture delineates as masculine or feminine. There's some blurring of the definitions, but I think they generally hold. So why do you prefer Gender for the column label?

Viewing 15 posts - 106 through 120 (of 156 total)

You must be logged in to reply to this topic. Login to reply