Baffled by duplicates despite selecting distinct values

  • I am getting the below error when trying to select distincts rows from a table into a temp table.

    : Violation of UNIQUE KEY constraint 'UQ__#Context__________5427A9BF'. Cannot insert duplicate key in object 'dbo.#Context'.

    So I create this below table and an index on it

    CREATE TABLE #Context(OrderId UNIQUEIDENTIFIER NOT NULL UNIQUE)

    go

    CREATE INDEX idx_tempCustomers ON #Context (OrderId)

    go

    then i insert rows in the temp table

    INSERT INTO #Context( OrderId )

    SELECT DISTINCT tinclude.OrderId FROM ( SELECT v.OrderId FROM dbo.mdv_CustomerFilterCustomerTransDate v WITH (NOLOCK)

    WHERE v.TerritoryCode IN (3) AND v.CustomerId = 649 AND ( v.CustomerTransDate >= '03/01/2015' AND v.CustomerTransDate < '04/01/2015')) AS tinclude

    go

    This is giving me the PK Violation error i mentioned above.

    Also note that mdv_CustomerFilterCustomerTransDate is a view as below

    SELECTOrderId

    ,TerritoryCode

    ,ClientId

    ,CustomerTransDate

    FROM dbo.vix_aggregate_CoreCustomerData

    Again vix_aggregate_CoreCustomerData is a view as below

    SELECTCustomerId

    , OrderId

    ,ClientId

    ,TerritoryCode

    ,DataForm

    ,CustomerTypeAS [LOB]

    ,CreateDate

    ,StampAS [BridgeStamp]

    ,[Date]AS [SubmitDate]

    ,IOCustomerIndAS [ServiceType]

    ,PayerInd

    ,OrderStatus

    ,TotalAmt

    ,FinancialClass

    ,FacilitySubId

    ,EditShortName

    ,TransmitFlag

    ,PrintCustAtCLS

    ,RelTrans

    ,TransmissionFlag

    ,StmtFromDate

    ,StmtThruDate

    ,TimesRebilled

    ,DateLastRebilled

    ,SecBilledDate

    ,UserId

    ,DateLastMod

    ,LinkToPrevForm

    ,CustomerTransDate

    ,DirectSubmit

    ,FormIndicator

    FROM dbo.tbl_Customers WITH(NOLOCK)

    I went down to the source table from the views tbl_customers, there the actual data is coming from and there are no duplicates there either. used below

    select count(OrderId ) Total, OrderId from tbl_customers with(nolock)

    group by OrderId

    having count(*) > 1

    Orderid is a GUID (uniqueidentifier length 16)

    Any ideas? It's very strange. Running the query right after the error works fine

  • bhattpranav (4/8/2015)


    --INSERT INTO #Context( OrderId )

    SELECT tinclude.OrderId FROM ( SELECT v.OrderId FROM dbo.mdv_CustomerFilterCustomerTransDate v WITH (NOLOCK)

    WHERE v.TerritoryCode IN (3) AND v.CustomerId = 649 AND ( v.CustomerTransDate >= '03/01/2015' AND v.CustomerTransDate < '04/01/2015')) AS tinclude

    go

    Try commenting out the INSERT (as above), then copying/pasting the results pane into Excel where it is easy to identify whether there are duplicates.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Is that a typo in your Insert statement or did you forget the DISTINCT keyword?

    INSERT INTO #Context( OrderId )

    SELECT DISTINCT tinclude.OrderId FROM ( SELECT v.OrderId FROM dbo.mdv_CustomerFilterCustomerTransDate v WITH (NOLOCK)

    WHERE v.TerritoryCode IN (3) AND v.CustomerId = 649 AND ( v.CustomerTransDate >= '03/01/2015' AND v.CustomerTransDate < '04/01/2015')) AS tinclude


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Not Really. you mean a DISTINCT right after INSERT?

    I am inserting into temp table taking values from the select distinct.

  • bhattpranav (4/9/2015)


    Not Really. you mean a DISTINCT right after INSERT?

    I am inserting into temp table taking values from the select distinct.

    But as was pointed out above, you don't have DISTINCT in your query anywhere...

    If you're sure the values really are distinct in the table, you might try taking that NOLOCK hint out and see if that is causing some craziness to come back from your queries.

    The Redneck DBA

  • Your select statement that you posted doesn't guarantee unique values for OrderID.

    Just for fun just try:

    SELECT

    tinclude.OrderId,

    COUNT(tinclude.OrderId)

    FROM

    ( SELECT v.OrderId FROM dbo.mdv_CustomerFilterCustomerTransDate v WITH (NOLOCK)

    WHERE v.TerritoryCode IN (3) AND v.CustomerId = 649 AND ( v.CustomerTransDate >= '03/01/2015' AND v.CustomerTransDate < '04/01/2015')) AS tinclude

    GROUP BY

    tinclude.OrderId

    HAVING

    COUNT(tinclude.OrderId) > 1

    If you get any results back then you are trying to insert duplicate ID's into your table which has a "UNIQUE" constraint. If you add the DISTINCT keyword like I added in my post then you should be fine.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • shouldn't it be >1 for duplicates?

    Corrected Query below. returns no rows

    SELECT

    tinclude.OrderId,

    COUNT(tinclude.OrderId)

    FROM

    ( SELECT v.OrderId FROM dbo.mdv_CustomerFilterCustomerTransDate v WITH (NOLOCK)

    WHERE v.TerritoryCode IN (3) AND v.CustomerId = 649 AND ( v.CustomerTransDate >= '03/01/2015' AND v.CustomerTransDate < '04/01/2015')) AS tinclude

    GROUP BY

    tinclude.OrderId

    HAVING

    COUNT(tinclude.OrderId) > 1

  • Yes, that is correct...see easy to make a typo. 😉

    Hmmm...if it didn't return anything that is strange.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • I tried and it didn't return any rows. I am thinking this may have something to do with the engine? If any knows?

    This code executes several 1000 times a day on prod.

    Or a view calling another view that actually has the actual table where the data is coming from?

  • Well the only thing left that I can think of is I sometimes forget to DROP or DELETE my temp tables when I testing some code. Maybe you ran the query more than once before clearing it. Since you are using a local (#) temp table no one else could be writing to it. Just a thought...


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • bhattpranav (4/9/2015)


    I tried and it didn't return any rows. I am thinking this may have something to do with the engine? If any knows?

    This code executes several 1000 times a day on prod.

    Or a view calling another view that actually has the actual table where the data is coming from?

    Can you try it without the NOLOCK hint just for fun?

    The Redneck DBA

  • dwain.c (4/9/2015)


    bhattpranav (4/8/2015)


    --INSERT INTO #Context( OrderId )

    SELECT tinclude.OrderId FROM ( SELECT v.OrderId FROM dbo.mdv_CustomerFilterCustomerTransDate v WITH (NOLOCK)

    WHERE v.TerritoryCode IN (3) AND v.CustomerId = 649 AND ( v.CustomerTransDate >= '03/01/2015' AND v.CustomerTransDate < '04/01/2015')) AS tinclude

    go

    Try commenting out the INSERT (as above), then copying/pasting the results pane into Excel where it is easy to identify whether there are duplicates.

    So did you try this? I've found that sometimes just seeing what is duplicated leads you quickly to solving the issue.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • That would be a lot of rows to put in an excel and compare or filter on. querying the table directly for duplicates would be so much quicker.

    It's weird that i am not able to duplicate the issue after the error occurs using the same input parameters

  • bhattpranav (4/9/2015)


    That would be a lot of rows to put in an excel and compare or filter on. querying the table directly for duplicates would be so much quicker.

    It's weird that i am not able to duplicate the issue after the error occurs using the same input parameters

    You don't need to put them all in at once. Put them in in chunks of say 10,000 until you've got some duplicates to analyze. Just make sure you sort them first.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply