Need to remove duplicate records and get distinct values

  • thanks d_sysuk for that using rank and over.. that works beautifully!

    i work with 2005 so just need to bother with that for now...

  • So...you want a query to give you results from a table, and your primarily interested on how to do this using 'OVER'? Which version of SQL Server are you using?

    yes i am interested to know more about 'OVER'. i m working with version 2005

  • Jeff Moden (7/16/2008)


    karthikeyan (7/16/2008)


    Jeff,

    how about my code ?

    What size column do you thing NAME = '' will make? Also, look at the update... which NAME shall it use to do the update?

    And, just and FYI.... the basic form of update that you have used is an undocumented form that, under just the right conditions, will cause a 30 second query to slam a 4 CPU box into the wall for hours. Hard to duplicate the conditions for that, but I've made repairs for that several times. Here's the basis of what I'm saying...

    --===== Wrong way to do a "Joined" update

    UPDATE TableA

    SET SomeTableAColumn = b.SomeTableBColumn

    FROM TableB b

    WHERE TableA.SomeOtherColumn = b.SomeOtherColumn

    --===== Right way to do a "Joined" update

    UPDATE TableA

    SET SomeTableAColumn = b.SomeTableBColumn

    FROM TableB b

    INNER JOIN TableA a

    ON a.SomeOtherColumn = b.SomeOtherColumn

    In order for it to be "legal", TableA must also be in the FROM clause or the huge performance problem could take place. General rule to follow on joined updates is that the object of the update must also be in the FROM clause.

    The hard part of this code is that it's very difficult to prove... I've not been able to make a test case using repeatable generated data that causes the problem every time so it can be studied. I only know that when I fix these types of updates that are real performance problems, the problem goes away.

    Karthik,

    Are you going to post a fix to your proevious code?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Sorry for the delay !

    create table T

    (

    ID int,

    Name varchar(50)

    )

    insert into T

    select 1,'John'

    union all

    select 2,'Mary'

    union all

    select 1,'Jerry'

    union all

    select 2,'Karthik'

    union all

    select 3,'Kumar'

    union all

    select 3,'Raghu'

    select distinct ID, Name = CONVERT(varchar(50),'')

    into #t1

    from T

    Here, i have fixed the Name column datatype to Varchar(50) instead of assigning ''.

    Because

    1) when we give Name ='' , the length will be considered as 1.

    2) when we give Name = convert(varchar(50),'') , the length will be considered as 50. i.e we assigned the value explicitly.

    karthik

  • In order for it to be "legal", TableA must also be in the FROM clause or the huge performance problem could take place. General rule to follow on joined updates is that the object of the update must also be in the FROM clause.

    The hard part of this code is that it's very difficult to prove... I've not been able to make a test case using repeatable generated data that causes the problem every time so it can be studied. I only know that when I fix these types of updates that are real performance problems, the problem goes away

    .

    As you suggested , i have modfied my UPDATE statement to

    update #t1

    set #t1.Name = T.Name

    from T inner join #t1

    on #t1.ID =T.ID

    karthik

  • What size column do you thing NAME = '' will make?

    Jeff,

    Thanks for highlighting this area.

    I have executed my previous code.

    select distinct ID, Name =''

    into #t1

    from T

    update #t1

    set #t1.Name = T.Name

    from T inner join #t1

    on #t1.ID =T.ID

    I got the below output:

    ID Name

    ----------- ----

    1 J

    2 M

    3 K

    karthik

  • After modifying Name column as Name = convert(varchar(50),'') , i have executed my modified code.

    select distinct ID, Name =Convert(varchar(50),'')

    into #t1

    from T

    update #t1

    set #t1.Name = T.Name

    from T inner join #t1

    on #t1.ID =T.ID

    I got the below output:

    ID Name

    ----------- ----

    1 John

    2 Mary

    3 Kumar

    karthik

  • karthikeyan (7/17/2008)


    In order for it to be "legal", TableA must also be in the FROM clause or the huge performance problem could take place. General rule to follow on joined updates is that the object of the update must also be in the FROM clause.

    As you suggested , i have modfied my UPDATE statement to

    update #t1

    set #t1.Name = T.Name

    from T inner join #t1

    on #t1.ID =T.ID

    Nearly correct, Karthik, but not quite...

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • As you suggested, i have tested both the scripts with SET STATISTICAL TIME ON.

    update #t1

    set #t1.Name = T.Name

    from T inner join #t1

    on #t1.ID =T.ID

    Execution Time 0.

    SQL Server cpu time: 0 ms. SQL Server elapsed time: 10 ms.

    (3 rows affected)

    update #t1

    set #t1.Name = T.Name

    from T , #t1

    where #t1.ID =T.ID

    Parse and Compile Time 0.

    SQL Server cpu time: 0 ms.

    Execution Time 0.

    SQL Server cpu time: 0 ms. SQL Server elapsed time: 13 ms.

    (3 rows affected)

    But , sometime first method takes 13 ms , second takes 10 ms, so i am confused here , which one is performance wise good.

    karthik

  • Nearly correct, Karthik, but not quite...

    I think i have followed the new join method.

    Can you highlight where i did mistake ?

    karthik

  • Of course!

    update #t1

    set #t1.Name = T.Name

    from #t1 inner join T

    on #t1.ID =T.ID

    In order for it to be "legal", TableA must also be in the FROM clause or the huge performance problem could take place. General rule to follow on joined updates is that the object of the update must also be in the FROM clause.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks Chris !

    I am not offending you, i am asking this question politely.

    Jeff's Sample code:

    --===== Right way to do a "Joined" update

    UPDATE TableA

    SET SomeTableAColumn = b.SomeTableBColumn

    FROM TableB b

    INNER JOIN TableA a

    ON a.SomeOtherColumn = b.SomeOtherColumn

    My query:

    update #t1

    set #t1.Name = T.Name

    from T

    inner join #t1

    on #t1.ID =T.ID

    Your code:

    update #t1

    set #t1.Name = T.Name

    from #t1 inner join T

    on #t1.ID =T.ID

    Parse and Compile Time 0.

    SQL Server cpu time: 0 ms.

    Execution Time 0.

    SQL Server cpu time: 0 ms. SQL Server elapsed time: 13 ms.

    (3 rows affected)

    I think you have changed the table order from the FROM clause.

    Again i am asking politely,which query will perform good ?

    karthik

  • I have noticed one interest thing from all the three methods ( My old method, new method , Chris's method ), i got the same output.

    ID Name

    ----------- ----

    1 John

    2 Mary

    3 Kumar

    why the result is not changing? what could be the reason ?

    karthik

  • I have executed all these three methods more than 10 times, But i got the same result at all time.

    1) update #t1

    set #t1.Name = T.Name

    from T , #t1

    where #t1.ID =T.ID

    2) update #t1

    set #t1.Name = T.Name

    from T inner join #t1

    on #t1.ID =T.ID

    3) update #t1

    set #t1.Name = T.Name

    from #t1 inner join T

    on #t1.ID =T.ID

    ID Name

    ----------- ----

    1 John

    2 Mary

    3 Kumar

    How ?

    karthik

  • Hi Chris and Jeff,

    I am leaving for the day. Will continue tomorrow.Please post your replies.

    karthik

Viewing 15 posts - 31 through 45 (of 58 total)

You must be logged in to reply to this topic. Login to reply