Sql Duplicate Values

  • Hu gurus,

    I am inserting values through xml. The following is the stored procedure i am using.

    i do not want to insert a duplicate id or name in the db table. (assume the table is empty)

    when i try to insert the duplicate id or name, then it is inserting the values. how to avoid inserting duplicate values in table.

    Note : please try this with empty rows in table.

    CREATE TABLE [Employee](

    [Id] int primary key,

    [FirstName] nvarchar(100)

    )

    procedure:

    --------------

    CREATE PROCEDURE SP_Insert_MultipleRows

    @xml XML,

    @retValue nvarchar(100) OUTPUT

    As

    BEGIN

    SET @retValue='Failed'

    INSERT INTO [Employee](

    [Id],

    [FirstName]

    )

    SELECT ID,

    FirstName

    FROM

    (

    SELECT DISTINCT

    COALESCE([Table].[Column].value('ID[1]', 'int'),0) as 'ID',

    [Table].[Column].value('FirstName[1]', 'Nvarchar(100)') as 'FirstName'

    FROM @xml.nodes('/Customers/customer') as [Table]([Column])

    )t

    WHERE not EXISTS ( SELECT 1

    FROM Employee

    WHERE ID = t.ID OR firstName = t.FirstName)

    END

    sp execute

    -----------

    Declare @retValue1 varchar(50);

    Declare @XmlStr XML;

    SET @XmlStr= N'<Customers>

    <customer>

    <ID>1</ID>

    <FirstName>fname1</FirstName>

    <LastName>Lname1</LastName>

    <Company>DEF</Company>

    </customer>

    <customer>

    <ID>1</ID>

    <FirstName>fname2</FirstName>

    <LastName>Lname2</LastName>

    <Company>ABC</Company>

    </customer>

    <customer>

    <ID>3</ID>

    <FirstName>fname3</FirstName>

    <LastName>Lname3</LastName>

    <Company>ABC</Company>

    </customer>

    <customer>

    <ID>4</ID>

    <FirstName>fname3</FirstName>

    <LastName>Lname3</LastName>

    <Company>ABC</Company>

    </customer>

    <customer>

    <ID>5</ID>

    <FirstName>fname5</FirstName>

    <LastName>Lname5</LastName>

    <Company>ABC</Company>

    </customer>

    </Customers>';

    EXEC [SP_Insert_MultipleRows] @xml=@XmlStr,@retValue=@retValue1 OUTPUT

    print @retValue1

    In xml node 1,2 id are duplicated, and in node 3,4 FirstName is duplicated, so only the node 5 should be inserted.

    in the sstored proc, i am checking the condition with NOT exists (but still duplicate values are inserted). please help me with this.

    output:

    -------------

    idfirstName

    1fname1

    1fname2

    3fname3

    4fname3

    5fname5

    required result:

    -------------------

    id firstName

    5 fname5

  • Apparently, the data quality of the XML you've been provided to work with is at an all time low. It seems to indicate the company has no clue what the proper name for EmployeeID X actually is nor what the ID for some given employee name is. It seems totally arbitrary to reject any duplicates either by ID or by Name.

    Would you actually want your name to be "deleted" from the mix because of this mistake?

    My recommendation would be to take the XML back to the people that created the problem and tell them the extreme danger they've managed to create in the XML. You are, after all, messing with employee data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Must say that I agree with Jeff here, this seems to be very fragile and error prone, ie. what if two customers have the same name?

    😎

    The deduplication part is really the easy part, here is a quick example

    DECLARE @XmlStr XML = N'<Customers>

    <customer>

    <ID>1</ID>

    <FirstName>fname1</FirstName>

    <LastName>Lname1</LastName>

    <Company>DEF</Company>

    </customer>

    <customer>

    <ID>1</ID>

    <FirstName>fname2</FirstName>

    <LastName>Lname2</LastName>

    <Company>ABC</Company>

    </customer>

    <customer>

    <ID>3</ID>

    <FirstName>fname3</FirstName>

    <LastName>Lname3</LastName>

    <Company>ABC</Company>

    </customer>

    <customer>

    <ID>4</ID>

    <FirstName>fname3</FirstName>

    <LastName>Lname3</LastName>

    <Company>ABC</Company>

    </customer>

    <customer>

    <ID>5</ID>

    <FirstName>fname5</FirstName>

    <LastName>Lname5</LastName>

    <Company>ABC</Company>

    </customer>

    </Customers>';

    ;WITH BASE_DATA AS

    (

    SELECT

    CUSTOMER.DATA.value('(ID/text())[1]' ,'INT' ) AS ID

    ,CUSTOMER.DATA.value('(FirstName/text())[1]' ,'NVARCHAR(100)') AS FirstName

    ,CUSTOMER.DATA.value('(LastName/text())[1]' ,'NVARCHAR(100)') AS LastName

    ,CUSTOMER.DATA.value('(Company/text())[1]' ,'NVARCHAR(100)') AS Company

    FROM @XmlStr.nodes('Customers/customer') AS CUSTOMER(DATA)

    )

    ,DUPES_FLAGGED AS

    (

    SELECT

    -- Add one of these for each duplication rule

    -- Duplicate ID values

    COUNT(*) OVER

    (

    PARTITION BY BD.ID

    ) - 1

    +

    -- Duplicate FirstName & LastName values

    COUNT(*) OVER

    (

    PARTITION BY BD.FirstName

    ,BD.LastName

    ) - 1 AS IS_DUPLICATE

    ,BD.ID

    ,BD.FirstName

    ,BD.LastName

    ,BD.Company

    FROM BASE_DATA BD

    )

    SELECT

    DF.ID

    ,DF.FirstName

    ,DF.LastName

    ,DF.Company

    FROM DUPES_FLAGGED DF

    WHERE DF.IS_DUPLICATE = 0;

    ;

    Output

    ID FirstName LastName Company

    --- ----------- ----------- --------

    5 fname5 Lname5 ABC

  • Eirikur Eiriksson (6/27/2016)


    Must say that I agree with Jeff here, this seems to be very fragile and error prone, ie. what if two customers have the same name?

    😎

    The deduplication part is really the easy part, here is a quick example

    DECLARE @XmlStr XML = N'<Customers>

    <customer>

    <ID>1</ID>

    <FirstName>fname1</FirstName>

    <LastName>Lname1</LastName>

    <Company>DEF</Company>

    </customer>

    <customer>

    <ID>1</ID>

    <FirstName>fname2</FirstName>

    <LastName>Lname2</LastName>

    <Company>ABC</Company>

    </customer>

    <customer>

    <ID>3</ID>

    <FirstName>fname3</FirstName>

    <LastName>Lname3</LastName>

    <Company>ABC</Company>

    </customer>

    <customer>

    <ID>4</ID>

    <FirstName>fname3</FirstName>

    <LastName>Lname3</LastName>

    <Company>ABC</Company>

    </customer>

    <customer>

    <ID>5</ID>

    <FirstName>fname5</FirstName>

    <LastName>Lname5</LastName>

    <Company>ABC</Company>

    </customer>

    </Customers>';

    ;WITH BASE_DATA AS

    (

    SELECT

    CUSTOMER.DATA.value('(ID/text())[1]' ,'INT' ) AS ID

    ,CUSTOMER.DATA.value('(FirstName/text())[1]' ,'NVARCHAR(100)') AS FirstName

    ,CUSTOMER.DATA.value('(LastName/text())[1]' ,'NVARCHAR(100)') AS LastName

    ,CUSTOMER.DATA.value('(Company/text())[1]' ,'NVARCHAR(100)') AS Company

    FROM @XmlStr.nodes('Customers/customer') AS CUSTOMER(DATA)

    )

    ,DUPES_FLAGGED AS

    (

    SELECT

    -- Add one of these for each duplication rule

    -- Duplicate ID values

    COUNT(*) OVER

    (

    PARTITION BY BD.ID

    ) - 1

    +

    -- Duplicate FirstName & LastName values

    COUNT(*) OVER

    (

    PARTITION BY BD.FirstName

    ,BD.LastName

    ) - 1 AS IS_DUPLICATE

    ,BD.ID

    ,BD.FirstName

    ,BD.LastName

    ,BD.Company

    FROM BASE_DATA BD

    )

    SELECT

    DF.ID

    ,DF.FirstName

    ,DF.LastName

    ,DF.Company

    FROM DUPES_FLAGGED DF

    WHERE DF.IS_DUPLICATE = 0;

    ;

    Output

    ID FirstName LastName Company

    --- ----------- ----------- --------

    5 fname5 Lname5 ABC

    Nice code, as always, Eirikur, but I'm curious why you would help someone do something that you agree they should not do. :blink: I hope the company the OP works for doesn't suffer a problem because of it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/27/2016)


    Nice code, as always, Eirikur, but I'm curious why you would help someone do something that you agree they should not do. :blink: I hope the company the OP works for doesn't suffer a problem because of it.

    What the OP had posted was not sufficient for enforcing even the broken/wrong business rules, chances are that OP's company would suffer even more without the assistance. At least now the OP has been warned and provided with a proper way of enforcing the logic.

    😎

  • Warned but armed.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply