Need to remove duplicate records and get distinct values

  • karthikeyan (7/17/2008)


    What size column do you thing NAME = '' will make?

    Jeff,

    Thanks for highlighting this area.

    I have executed my previous code.

    select distinct ID, Name =''

    into #t1

    from T

    update #t1

    set #t1.Name = T.Name

    from T inner join #t1

    on #t1.ID =T.ID

    I got the below output:

    ID Name

    ----------- ----

    1 J

    2 M

    3 K

    When I executed your code, I got two errors... the first one was about not being able to create a zero length column because of the Name=''.

    The second one was because there was no column alias on the SET NAME = NAME statement.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • select id,name from tablename

    where id in (select id from tablename group by id having count(id)>1)

    OR

    select id,name from tablename group by

    id having count(id) > 1

    OR (For ALL columns Contains Same Values in rows)

    select * from tablename group by

    col1,col2.....coln having count(*) > 1

  • select id,name from tablename

    where id in (select id from tablename group by id having count(id)>1)

    OR

    select id,name from tablename group by

    id having count(id) > 1

    OR (For ALL columns Contains Same Values in rows)

    select * from tablename group by

    col1,col2.....coln having count(*) > 1

    Charu πŸ˜›

  • karthikeyan (7/17/2008)


    Thanks Chris !

    I am not offending you, i am asking this question politely.

    Jeff's Sample code:

    --===== Right way to do a "Joined" update

    UPDATE TableA

    SET SomeTableAColumn = b.SomeTableBColumn

    FROM TableB b

    INNER JOIN TableA a

    ON a.SomeOtherColumn = b.SomeOtherColumn

    My query:

    update #t1

    set #t1.Name = T.Name

    from T

    inner join #t1

    on #t1.ID =T.ID

    Your code:

    update #t1

    set #t1.Name = T.Name

    from #t1 inner join T

    on #t1.ID =T.ID

    Parse and Compile Time 0.

    SQL Server cpu time: 0 ms.

    Execution Time 0.

    SQL Server cpu time: 0 ms. SQL Server elapsed time: 13 ms.

    (3 rows affected)

    I think you have changed the table order from the FROM clause.

    Again i am asking politely,which query will perform good ?

    All 3 of these are fine... it's an inner join...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • karthikeyan (7/17/2008)


    I have executed all these three methods more than 10 times, But i got the same result at all time.

    1) update #t1

    set #t1.Name = T.Name

    from T , #t1

    where #t1.ID =T.ID

    2) update #t1

    set #t1.Name = T.Name

    from T inner join #t1

    on #t1.ID =T.ID

    3) update #t1

    set #t1.Name = T.Name

    from #t1 inner join T

    on #t1.ID =T.ID

    ID Name

    ----------- ----

    1 John

    2 Mary

    3 Kumar

    How ?

    They're all a form of inner joins... they make no difference to the Server. Execution plan for all 3 will be identical. That's how.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've got three pennies in my pocket. I've counted them more than ten times now, facing different directions, but each time I get the same result. Three pennies.

    I'm baffled.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I noticed that we didnt really analyse all solutions that are best for performance

    (Just looked at the solutions that were all the same)

    - I'll recall from the thread: [From the first part of the request]

    1) We have a solution with a temp table

    2) We have a solution with a select id, max(value) from <>

    ..

    may have been other solutions, but I havn't yet seen the difference between the temp table / the group by solution yet.

    As I mentioned earlier mileage probably varies, based on indexing, table size, resetting of cache, as perf. testing is an art.

  • d_sysuk (7/16/2008)


    Okay so the posters had part I published several times, with various answers... using min/max, a subquery, a temp-table.

    The perfm. analysis can be made easily by Sam/others just run them across (expect the group by will be higher perf than temp tables, subquery join - but thats just an untested hypothesis - also depends on indexing of the subtable etc...[e.g. ID - clustered, value - non-clustered, or would a covering index be better here.. etc, I'm sure it will all affect the results])

    Anyone up for programming PART II of the request from Sam?

    I certainly know how to do it, want to see who comes up with a nice solution [Mine would involve two loops].

    [SQL Server 2005 solution is neat and tidy using rank function].

    I already posted a link to a couple of possible solutions... if you'd like to post your double loop solution, I'll give two of the other solutions that work in SQL Server 2000 a shot and post them along with performance testing of the 3 methods...

    I also posted a million row data generator for this earlier hoping that Karthik would post some test code to answer his own earlier performance question. I'll use that as the basis for testing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well I posted a SQL 2005 solution to the question (which the poster seems to fit fine, as this is really a sql 2005 solution not sql7/2000 - yes posted in wrong forum). I cannot see the point in posting a hard second part solution when sql 2005 is the target platform here.. shame I saw nobody else solve Part II !

    Anyways, from my earlier posts - here it is consilidated:

    PART I

    select id, min(value) value

    from

    (select '1' id,'br' value

    union

    select '1','er'

    union

    select '2','rt'

    union

    select '2','uj'

    union

    select '3','iu'

    union

    select '3','ol') data

    group by id

    PART II:

    select distinct id,value, rank() over (partition by id order by value)

    from

    (select '1' id,'br' value

    union

    select '1','er'

    union

    select '2','rt'

    union

    select '2','uj'

    union

    select '3','iu'

    union

    select '3','ol') data

    Output:

    id value (No column name)

    1 br 1

    1 er 2

    2 rt 1

    2 uj 2

    3 iu 1

    3 ol 2

  • d_sysuk (7/21/2008)


    shame I saw nobody else solve Part II !

    No need to gloat... No sense in someone else solving it... you already did it! πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Performed a quick perf. analysis, as suspected.. here are the results:

    Query 1: [K's]

    select distinct ID, Name = CONVERT(varchar(50),'')

    into #t1

    from T1

    update #t1

    set #t1.Name = T1.Name

    from T1 inner join #t1

    on #t1.ID =T1.ID

    select * from #t1

    -------------------------------------------------------

    Query 2:

    select id, min(name) value

    from

    T1

    group by id

    --------------------------------------------------------

    Query 3:

    select id, max(name) value

    from

    T1

    group by id

    --------------------------------------------------------

    All results on T1 (1 Million row record set, clustered index on id (asc),name (asc), - max value of ID : 10,000 ; name unique - ratio of dup IDs: 100:1)

    Swapping the clustered PK to id (asc), name (desc) - the results from query 2 / 3 swap - showing the index makes a difference on performance to which operator should be chosen (min/max) - approx 13% diff [test results not included here].

    All runs had set statistics io/ time on.

    All runs had the following run before each query was tested to reset the cache (we reset the cache for each query, and ran each query individually):

    DBCC DROPCLEANBUFFERS

    dbcc FREEPROCCACHE

    dbcc FREESYSTEMCACHE ( 'ALL' )

    dbcc FREESESSIONCACHE

    Query 1: [Using temp table]

    -- First Part: Select distinct part:

    SQL Server parse and compile time:

    CPU time = 15 ms, elapsed time = 611 ms.

    Table 'T1'. Scan count 1, logical reads 3190, physical reads 1, read-ahead reads 3185, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 297 ms, elapsed time = 1069 ms.

    -- Second Part: Update part:

    SQL Server parse and compile time:

    CPU time = 15 ms, elapsed time = 68 ms.

    Table '#t1_________________________________________________________________________________________________________________0000000000FF'. Scan count 1, logical reads 27068, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'T1'. Scan count 1, logical reads 3190, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 985 ms, elapsed time = 1589 ms.

    --Third Part: Display output:

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#t1_________________________________________________________________________________________________________________0000000000FF'. Scan count 1, logical reads 8583, physical reads 0, read-ahead reads 8, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 16 ms, elapsed time = 138 ms.

    Query 2: [min version]

    Table 'T1'. Scan count 1, logical reads 3190, physical reads 1, read-ahead reads 3185, lob logical reads 0,

    lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 422 ms, elapsed time = 980 ms.

    Query 3: [max version]

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 163 ms.

    Table 'T1'. Scan count 1, logical reads 3190, physical reads 1, read-ahead reads 3185, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 469 ms, elapsed time = 1231 ms.

  • d_sysuk (7/16/2008)

    --------------------------------------------------------------------------------

    Okay so the posters had part I published several times, with various answers... using min/max, a subquery, a temp-table.

    The perfm. analysis can be made easily by Sam/others just run them across (expect the group by will be higher perf than temp tables, subquery join - but thats just an untested hypothesis - also depends on indexing of the subtable etc...[e.g. ID - clustered, value - non-clustered, or would a covering index be better here.. etc, I'm sure it will all affect the results])

    Anyone up for programming PART II of the request from Sam?

    I certainly know how to do it, want to see who comes up with a nice solution [Mine would involve two loops].

    [SQL Server 2005 solution is neat and tidy using rank function].

    I already posted a link to a couple of possible solutions... if you'd like to post your double loop solution, I'll give two of the other solutions that work in SQL Server 2000 a shot and post them along with performance testing of the 3 methods...

    I also posted a million row data generator for this earlier hoping that Karthik would post some test code to answer his own earlier performance question. I'll use that as the basis for testing.

    Jeff,

    Sorry ! i was little busy last week. I will come up with test code within couple of days.

    karthik

  • Use DISTINCT..!

  • kannan (7/30/2008)


    Use DISTINCT..!

    Sure... let's see your code example... πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 46 through 58 (of 58 total)

You must be logged in to reply to this topic. Login to reply