Delete Duplicates

  • Comments posted to this topic are about the item Delete Duplicates


  • Grrrreat question, Excellent answer, 'nuff said:P:P:P:P:P:P

    :-PManie Verster
    South Africa

    I can do all things through Christ who strengthens me. - Holy Bible
    I am a man of fixed and unbending principles, the first of which is to be flexible at all times. - Everett Mckinley Dirkson (Well, I am trying. - Manie Verster)

  • I'm wrong here but this is not correct question;

    1 & 2 are the same ... with same condition! So the correct answer is 2 but the 1 is the same everything ... too stupid!

    3 is different!

    also with three options I do not have valid parse for T-SQL code!

    how can you explain it!?

    :hehe::w00t: :P:P:P:P:P


  • OHHH nooooooooooooooooo

    I work just from select statement not "with numbered"


    nononononoonononononononono I anaylized very well what happend but without this piece of code!



  • whats happened to sql - it used to be so simple 😉

  • dougi - are you using sql 2000 by any chance?

  • I agree, this was an excellent question!

  • Thanks for an Excellent question really rather simply once one understands how partitioning works........... learned a lot from it ...

    If everything seems to be going well, you have obviously overlooked something.


    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Gives me more stuff to figure out, thanks for the question.

  • A semicolon at the beginning of the statement is valid

    I'm seeing a lot of this lately. I.e., statements such as:

    ;WITH ...

    Note that the semicolon is NOT part of the WITH statement syntax. This issue only arises if the prior statement did not end with a semicolon (which it should). So the leading semicolon is terminating the prior statement.

    SQL Server is starting to enforce proper statement termination. IMO, long overdue. Get into the habit of properly terminating ALL of your T-SQL statements. Even Microsoft recommends using it. From BOL:

    Many code examples use a semicolon ( ; ) as a Transact-SQL statement terminator. Although the semicolon is not required, using it is considered a good practice.

    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • I agree with the others, great question!


    Please take a number. Now serving emergency 1,203,894

  • This anwer works for the given data but wouldn't work if we inserted a row like:

    insert into #new(id,keyvalue) values (1,'bb')

    The stated answer assumes a duplicate is defined by the ID column and would delete the above row even thought it's unique in the table. Since no PK is defined on the table I think we must assume a duplicate is defined by multiple rows with identical values for every column. So if we include both ID and keyvalue columns in the over() clause we will delete only non-unique rows:

    with numbered as


    SELECT rowno=row_number() over (partition by ID, keyvalue order by ID, keyvalue),ID,keyvalue

    from #new


    delete from numbered where rowno>1

  • J is correct.

    As for SQL, the old one will work too for this particular case:

    declare @I int

    select @I = (select count(convert(varchar(2),ID)+Keyvalue) from #new

    group by (convert(varchar(2),ID)+Keyvalue)

    having count(convert(varchar(2),ID)+Keyvalue) >1)

    set @I = @I-1

    set rowcount @I

    delete #New where convert(varchar(2),ID)+Keyvalue


    (select convert(varchar(2),ID)+Keyvalue from #new

    group by (convert(varchar(2),ID)+Keyvalue)

    having count(convert(varchar(2),ID)+Keyvalue) >1)

    Regards,Yelena Varsha

  • So if we include both ID and keyvalue columns in the over() clause we will delete only non-unique rows:

    The angle you derive with the "Partition by" clause is literally a "partition" of a particular cluster of fields defined uniquely. If you prefer, then also, use only the "order by" portion of the clause and create a numeric cluster("PK" ID ) that is dynamic [rather than static].

    Select rowno=Row_Number() over(Order by mychoiceofID,2ndchoice,etcchoice), * from mytable.


  • Excellent question. I learned something useful.

    One application of OVER that seems to have potential (note hedging 😉 ) is that it allows applying aggregate function to more than one different grouping in the same query. For instance,

    select customerID, productID, orderDate, orderAmount,

    CustomerTotal = sum(orderAmount) over (partition by customerID),

    ProductTotal = sum(orderAmount) over (partition by productID)

    from Orders

    Any thoughts on performance?

    Edited for clarity.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply