Duplicates

  • Comments posted to this topic are about the item Duplicates

  • Thank you, Steve, really good one. (more to see such questions)

    (at first... it did not made any sense to me.. all I was saying to myself was "this qtod is completely wrong"... and then after 45 minutes of analysis with patience, removing the error sql statements, narrow-downed to two, then pen and paper came in to the picture, listed the output to my knowledge and finally executed the statements to make sure....:-D well what I can SAY!!! "my analysis is getting really better" also my knowledge on understanding "English" is getting better :cool:)

    Thank you again.

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • This was removed by the editor as SPAM

  • Nice question, thanks Steve.

    Easy one as I use this almost daily 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • You want to select all but the newest record, so,

    why can't you say that ?

  • I got it right because I use this type of query quite a lot, but I would argue that none of the queries will help you because they only return the Editor Names, and deleting by LName alone will delete all the records - you need the other columns as well

  • h.tobisch (2/4/2015)


    You want to select all but the newest record, so,j

    why can't you say that ?

    +1

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • matthew.flower (2/4/2015)


    ... I would argue that none of the queries will help you because they only return the Editor Names, and deleting by LName alone will delete all the records - you need the other columns as well

    +1

  • I use the same approach frequently, but I had to read them because there was no CTE. Thanks for this one.

  • Great question. My answer was incorrect, but I learned something that will come in handy in the future.

  • Good question.

    A

    SET DATEFORMAT MDY;

    in the beginning would be helpful.

  • The explanation is not correct. We do not want all rows except for Jones on 2008-04-01. We want all where cnt > 1. This will remove 1 of the other names (Smith, Jones, Knight). So our result set should be Jones, Jones, Smith (see below). The question and explanation refer to "after the date", yet no date is specified.

    Original Data Set

    'Jones', 'Editor', '1/1/2003'

    'Jones', 'Editor', '2/11/2005'

    'Jones', 'Editor', '4/1/2008'

    'Smith', 'Editor', '6/1/2004'

    'Smith', 'Editor', '11/16/2005'

    'Knight', 'Editor', '1/11/2007'

    Result Set from Query

    'Jones', 'Editor', '1/1/2003'

    'Jones', 'Editor', '2/11/2005'

    'Smith', 'Editor', '6/1/2004'

    The Question:

    I want to remove the duplicates in this table, based on choosing the newest record. I am trying to just find the latest editor in this table and want to query for all records past the latest editor that I will remove. Which query will help me?

    The question itself is contradictory, how can you remove everything past the newest (latest) editor? Isn't that the latest date? The only part of the question statement that is applied to the answer is the first sentence on choosing the newest record for each lname!

    Process of elimination resulted in only a single answer:

    Option 1: does not select the newest or oldest, simply adds a row number to the existing result set.

    Option 2: Aggregate will always show a cnt of 1 OR invalid aggregate (I don't have SQL to test, but probably an invalid aggregate)

    Option 3: HAVING Clause with no Aggregate

    Option 4: HAVING Clause with no Aggregate

    Option 5: Shows all rows where date modified is not the latest for the Job Title/LName combination.

    Most of these queries are invalid. The data we want returned is all the rows except the one with Jones for 2008-04-01. These results would be the row with Knight, both Smith rows, and the two Jones rows before this date.

    The last query is the one that does this. You cannot add a HAVING clause here without an aggregate. An easy way to limit results here is with a WHERE clause, as the ROW_NUMBER cannot be in a WHERE clause.

  • venoym (2/4/2015)


    The explanation is not correct. We do not want all rows except for Jones on 2008-04-01. We want all where cnt > 1. This will remove 1 of the other names (Smith, Jones, Knight). So our result set should be Jones, Jones, Smith (see below). The question and explanation refer to "after the date", yet no date is specified.

    Original Data Set

    'Jones', 'Editor', '1/1/2003'

    'Jones', 'Editor', '2/11/2005'

    'Jones', 'Editor', '4/1/2008'

    'Smith', 'Editor', '6/1/2004'

    'Smith', 'Editor', '11/16/2005'

    'Knight', 'Editor', '1/11/2007'

    Result Set from Query

    'Jones', 'Editor', '1/1/2003'

    'Jones', 'Editor', '2/11/2005'

    'Smith', 'Editor', '6/1/2004'

    The Question:

    I want to remove the duplicates in this table, based on choosing the newest record. I am trying to just find the latest editor in this table and want to query for all records past the latest editor that I will remove. Which query will help me?

    The question itself is contradictory, how can you remove everything past the newest (latest) editor? Isn't that the latest date? The only part of the question statement that is applied to the answer is the first sentence on choosing the newest record for each lname!

    Process of elimination resulted in only a single answer:

    Option 1: does not select the newest or oldest, simply adds a row number to the existing result set.

    Option 2: Aggregate will always show a cnt of 1 OR invalid aggregate (I don't have SQL to test, but probably an invalid aggregate)

    Option 3: HAVING Clause with no Aggregate

    Option 4: HAVING Clause with no Aggregate

    Option 5: Shows all rows where date modified is not the latest for the Job Title/LName combination.

    Most of these queries are invalid. The data we want returned is all the rows except the one with Jones for 2008-04-01. These results would be the row with Knight, both Smith rows, and the two Jones rows before this date.

    The last query is the one that does this. You cannot add a HAVING clause here without an aggregate. An easy way to limit results here is with a WHERE clause, as the ROW_NUMBER cannot be in a WHERE clause.

    I wondered about this as well...

  • Koen Verbeeck (2/4/2015)


    Nice question, thanks Steve.

    Easy one as I use this almost daily 🙂

    +1 Thanks for the question Steve.



    Everything is awesome!

  • Raghavendra Mudugal (2/4/2015)


    also my knowledge on understanding "English" is getting better :cool:)

    Congratulations 😀

    Thanks & Best Regards,
    Hany Helmy
    SQL Server Database Consultant

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply