Need to remove duplicate records and get distinct values

  • just insert rest of values into table too.

  • Dugi (7/16/2008)


    Hi Jeff!

    Your code here just selecting and ordering asc the data inside the table it doesn't do any elimination of duplicate records!

    Can you explain little bit this!

    Dugi

    Heh... Sure! I shouldn't write code before coffee! I screwed up... I'm deleting the post.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dugi (7/16/2008)


    Hi Jeff!

    Your code here just selecting and ordering asc the data inside the table it doesn't do any elimination of duplicate records!

    Can you explain little bit this!

    Dugi

    I believe that Telammica's code will do the trick... l appologize for my bit of bad code. More coffee is on the way! :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... now that I've had a bit of coffee... 😛

    In the original problem posted, San didn't care about which name was returned so just use the MIN or Max name for each ID posted.

    SELECT ID,MAX(Name) AS Name

    FROM dbo.JBMTest

    GROUP BY ID

    You can do the ranking in SQL Server 2000 a couple of ways... some are a lot faster than others...

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    how about my code ?

    karthik

  • which one is good when we think performance ?

    karthik

  • Because i saw more than 4 answer for this post so far.

    karthik

  • karthikeyan (7/16/2008)


    Jeff,

    how about my code ?

    What size column do you thing NAME = '' will make? Also, look at the update... which NAME shall it use to do the update?

    And, just and FYI.... the basic form of update that you have used is an undocumented form that, under just the right conditions, will cause a 30 second query to slam a 4 CPU box into the wall for hours. Hard to duplicate the conditions for that, but I've made repairs for that several times. Here's the basis of what I'm saying...

    --===== Wrong way to do a "Joined" update

    UPDATE TableA

    SET SomeTableAColumn = b.SomeTableBColumn

    FROM TableB b

    WHERE TableA.SomeOtherColumn = b.SomeOtherColumn

    --===== Right way to do a "Joined" update

    UPDATE TableA

    SET SomeTableAColumn = b.SomeTableBColumn

    FROM TableB b

    INNER JOIN TableA a

    ON a.SomeOtherColumn = b.SomeOtherColumn

    In order for it to be "legal", TableA must also be in the FROM clause or the huge performance problem could take place. General rule to follow on joined updates is that the object of the update must also be in the FROM clause.

    The hard part of this code is that it's very difficult to prove... I've not been able to make a test case using repeatable generated data that causes the problem every time so it can be studied. I only know that when I fix these types of updates that are real performance problems, the problem goes away.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • karthikeyan (7/16/2008)


    which one is good when we think performance ?

    Try it... let us know...

    --===== Create and populate a 1,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "ID" has a range of 1 to 50,000 non-unique numbers

    -- Column "Name" has a range of "AA" to "ZZ" non-unique 2 character stringsontains 12 random hex characters (ie, 0-9,A-F)

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    ID = ABS(CHECKSUM(NEWID()))%50000+1,

    Name = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    -- Takes about 1 second to execute.

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (SomeID)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • karthikeyan (7/15/2008)


    use distinct keyword

    Interesting solution - care to elaborate?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • San (7/15/2008)


    Hi

    I am sorry if someone has already posted this question and am asking again.

    This problem is eating my head, I have a table

    Table 1 -

    1br

    1er

    2rt

    3yh

    2uj

    3iu

    3ol

    Now I want only 1 row for each unique id, that row could be any row but id shouldn't repeat.

    1br

    2rt

    3yh

    or

    1er

    2uj

    3ol

    Thanks

    San, do you have any more information regarding this puzzle? For instance, are you aiming to change an existing table to meet your criteria, or to return a satisfactory result set from an existing table? It's not yet clear from your posts.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Okay so the posters had part I published several times, with various answers... using min/max, a subquery, a temp-table.

    The perfm. analysis can be made easily by Sam/others just run them across (expect the group by will be higher perf than temp tables, subquery join - but thats just an untested hypothesis - also depends on indexing of the subtable etc...[e.g. ID - clustered, value - non-clustered, or would a covering index be better here.. etc, I'm sure it will all affect the results])

    Anyone up for programming PART II of the request from Sam?

    I certainly know how to do it, want to see who comes up with a nice solution [Mine would involve two loops].

    [SQL Server 2005 solution is neat and tidy using rank function].

  • San, do you have any more information regarding this puzzle? For instance, are you aiming to change an existing table to meet your criteria, or to return a satisfactory result set from an existing table? It's not yet clear from your posts.

    Well initially I wanted just 1 row for each id, could be any row.

    But yes now I want to control as to say i want the 2nd / 3rd... datavalue for each id and if an id doesnt have a 2nd/3rd value then the last value it has...

    i hope you getting me..

    also can someone provide me more links on 'Over' ? not the msdn ones..

  • San (7/16/2008)


    San, do you have any more information regarding this puzzle? For instance, are you aiming to change an existing table to meet your criteria, or to return a satisfactory result set from an existing table? It's not yet clear from your posts.

    Well initially I wanted just 1 row for each id, could be any row.

    But yes now I want to control as to say i want the 2nd / 3rd... datavalue for each id and if an id doesnt have a 2nd/3rd value then the last value it has...

    i hope you getting me..

    also can someone provide me more links on 'Over' ? not the msdn ones..

    So...you want a query to give you results from a table, and your primarily interested on how to do this using 'OVER'? Which version of SQL Server are you using?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • This is the SQL Server 2005 version:

    select distinct id,value, rank() over (partition by id order by value)

    from

    (select '1' id,'br' value

    union

    select '1','er'

    union

    select '2','rt'

    union

    select '2','uj'

    union

    select '3','iu'

    union

    select '3','ol') data

    Output:

    idvalue(No column name)

    1br1

    1er2

    2rt1

    2uj2

    3iu1

    3ol2

    As said the SQL 2000 version, is a bit messy [waiting to check requirements - there is no OVER clause in SQL 2000 or RANK function].

Viewing 15 posts - 16 through 30 (of 58 total)

You must be logged in to reply to this topic. Login to reply