making records unique

  • Hi everyone, I've a table with 15 fields with 70,000 rows of data. I now realise I need to make each record unique. I could have gone down the road of using if exists ... statement in my Store Procedure to stop duplicates, but I'd have to test each field to see if they all make a duplicate row. I did think about making all the fields a primary key. The problem with that is I've allowed Nulls in some of the fields, I did think of updating the Null fields with 'N/A' but I can't remember the SQL statment that does it. Can someone please help me resolve this with a suggestion.

  • If I make a new table with the same structure, is there a way I could copy table1 into the new table using select into but substituting NULL with 'N/A'

  • Its not a good idea to include all fields in a primary key as it will affect the performance....

    Did you try DISTINCT to see how many records you are getting after using it or it is returning same 70000 rows after using DISTINCT?

    please let us know

    _______________________________________________________________
    To get quick answer follow this link:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • mick burden (9/12/2013)


    If I make a new table with the same structure, is there a way I could copy table1 into the new table using select into but substituting NULL with 'N/A'

    Can you please post the table structure and tell us for which fields you are getting repeatitive values?

    _______________________________________________________________
    To get quick answer follow this link:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • mick burden (9/12/2013)


    Hi everyone, I've a table with 15 fields with 70,000 rows of data. I now realise I need to make each record unique.

    Easy.

    I could have gone down the road of using if exists ... statement in my Store Procedure to stop duplicates, but I'd have to test each field to see if they all make a duplicate row.

    Still easy

    I did think about making all the fields a primary key. The problem with that is I've allowed Nulls in some of the fields, I did think of updating the Null fields with 'N/A' but I can't remember the SQL statment that does it. Can someone please help me resolve this with a suggestion.

    Probably not the best solution.

    What do you really want to do? Remove the duplicate rows? Fix the sproc so that duplicate rows are not output? Both?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Yes - you can do a select into, and do a case when login per field - as its a one off pain at first but then its done

    ________________________________________________________________________________________________
    Regards
    Steve
    SQL 2008 DBA/DBD - MCTS/MCITP

    Please don't trust me, test the solutions I give you before using them.

  • I've tried Distinct as requested and there's only about 200 duplicates

  • Yes - you can do a select into, and do a case when login per field - as its a one off pain at first but then its done

    how does that work?

  • Here is my script for creating the table

    if object_id('PostCodesAndAddressesBt','U')is null

    CREATE TABLE PostCodesAndAddressesBt

    (ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    Postcode varchar(10) ,

    BuildingName varchar(200),

    housenumber varchar(20) ,

    SubBuilding varchar(100) ,

    County varchar(100),

    Town varchar(100) ,

    LocalName varchar(100) ,

    PostOutCode varchar(100),

    PostInCode varchar(100),

    Throughname varchar(100),

    Throughdesc varchar(100),

    poboxno varchar(100),

    BusinessName varchar(200),

    locality varchar(200))

  • I'd like to see both senarios to see what's best

  • Can you post a few rows please Mick, showing some dupes?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • What results do you get from this query?

    ;WITH Deleter AS (

    SELECT

    ID,

    rn = ROW_NUMBER() OVER(ORDER BY Postcode,BuildingName,housenumber,SubBuilding,County,Town,

    LocalName,PostOutCode,PostInCode,Throughname,Throughdesc,poboxno,BusinessName,locality)

    FROM PostCodesAndAddressesBt

    )

    SELECT *

    FROM Deleter

    WHERE rn > 1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I get to columns of integers, I know the first column is ID numbers, not sure what the other integers are, can you tell me what that statement does please?

  • ChrisM@Work (9/12/2013)


    What results do you get from this query?

    ;WITH Deleter AS (

    SELECT

    ID,

    rn = ROW_NUMBER() OVER(ORDER BY Postcode,BuildingName,housenumber,SubBuilding,County,Town,

    LocalName,PostOutCode,PostInCode,Throughname,Throughdesc,poboxno,BusinessName,locality)

    FROM PostCodesAndAddressesBt

    )

    SELECT *

    FROM Deleter

    WHERE rn > 1

    I think he meant this, Mick:

    ;WITH Deleter AS (

    SELECT

    ID,

    rn = ROW_NUMBER() OVER(PARTITION BY Postcode,BuildingName,housenumber,SubBuilding,County,Town,

    LocalName,PostOutCode,PostInCode,Throughname,Throughdesc,poboxno,BusinessName,locality)

    FROM PostCodesAndAddressesBt

    )

    SELECT *

    FROM Deleter

    WHERE rn > 1


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • I've run the sql snippet and I'm getting the following error

    Msg 4112, Level 15, State 1, Line 4

    The ranking function "ROW_NUMBER" must have an ORDER BY clause.

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply