Receiving duplicate key insert error with DISTINCT clause

  • I have been working on a bug for a couple days now which I have determined to be caused by the use of the NOLOCK in a query. Here is the query

    Declare @AbsTab table (org_id int, absr_id int, abs_id int, abs_date datetime primary key (absr_id, abs_id,abstab_id))

    Insert @AbsTab

    select

    a.org_id,

    a.absr_id,

    a.abs_id,

    a.abs_date

    from absence a (nolock)

    where a.abs_date between @StartDate and @EndDate

    and a.abs_deleted = 0

    and a.abs_id not in (select abs_id from @CloseAbs)

    I have determined that the (NOLOCK) is causing me problems because this query is using a covering index that includes updatable fields and we are seeing dirty reads. These dirty reads are causing duplicate key insert errors.

    Msg 2627, Level 14, State 1, Line 117

    Violation of PRIMARY KEY constraint 'PK__#4B31A61E__4C25CA57'. Cannot insert duplicate key in object 'dbo.@AbsTab'.

    OK...I can understand this much and I can accept it, after all this is the sacrifice we have to make when we choose to use isolation level READ UNCOMMITTED.

    My QUESTION, if I put a DISTINCT clause on the SELECT statement shouldn't this guarantee me unique rows?

    Insert @AbsTab

    select DISTINCT

    a.org_id,

    a.absr_id,

    a.abs_id,

    a.abs_date

    from absence a (nolock)

    where a.abs_date between @StartDate and @EndDate

    and a.abs_deleted = 0

    and a.abs_id not in (select abs_id from @CloseAbs)

    I have tested this and I continue to receive errors.

    I have written some code to trap the results being inserted into this table:

    1105331769404462299842009-03-11 00:00:00.000

    1105331769404462299842009-03-11 00:00:00.000

    As you can see, the values are identical as far as I can tell.

    Can anyone explain why the DISTINCT does not work in this case.

    By the way, I am able to eliminate the error by adding an IDENTITY column to the table variable and making this column part of the PK. This does not get around the duplicate data but it does resolve the DUPLICATE KEY INSERT errors.

  • Are you sure only **one** insert is going at a time ?


    * Noel

  • Not sure exactly what your asking.

    This is a table variable so all inserts are isolated to this process.

    The plan is not taking advantage of parallelism.

    Let me know if this does not answer your question.

  • Hi Eric

    The NOLOCK may be your problem if there are very much data manipulation while your statement is running. Maybe have a look at:

    SET TRANSACTION ISOLATION LEVEL SNAPSHOT

    Greets

    Flo

  • When I try to run just your declare:

    (Declare @AbsTab table (org_id int, absr_id int, abs_id int, abs_date datetime primary key (absr_id, abs_id,abstab_id)))

    I get:

    Msg 1911, Level 16, State 1, Line 1

    Column name 'abstab_id' does not exist in the target table or view.

    Msg 1750, Level 16, State 0, Line 1

    Could not create constraint. See previous errors.

    I'm curious, one of the errors you got referred to 'dbo.@AbsTab'. Is that a "hard" table rather than a table variable?

    bc

    [font="Arial Narrow"]bc[/font]

  • Sorry...my bad:blush:

    remove abstabid from the PK

    that was a remnant from my previous testing...jugling a lot of stuff

  • @AbsTab is a declared table variable

  • The table that I am querying is too hot of a table to turn snapshot processing on.

    I am more curious as to why DISTINCT does not work here.

  • Nevermind I need some coffee :d


    * Noel

  • Do you see dupes when you run just the Select DISTINCT?

    select DISTINCT

    a.org_id,

    a.absr_id,

    a.abs_id,

    a.abs_date

    from absence a (nolock)

    where a.abs_date between @StartDate and @EndDate

    and a.abs_deleted = 0

    and a.abs_id not in (select abs_id from @CloseAbs)

    I can think of no reason DISTINCT shouldn't do as its told.

    bc

    [font="Arial Narrow"]bc[/font]

  • Read http://www.sqlservercentral.com/Forums/Topic673040-65-1.aspx#bm673605 post by Gail she explains affects of (nolock) and how you can get duplicate records :).

    Mohit.

    Edit: SPELLING .... *dies*

    [font="Arial"]---

    Mohit K. Gupta, MCITP: Database Administrator (2005), My Blog, Twitter: @SQLCAN[/url].
    Microsoft FTE - SQL Server PFE

    * Some time its the search that counts, not the finding...
    * I didn't think so, but if I was wrong, I was wrong. I'd rather do something, and make a mistake than be frightened and be doing nothing. :smooooth:[/font]

    How to ask for help .. Read Best Practices here[/url].

  • Your answer makes me feel a little better.

    I am not sure that I can run your query exactly as requested.

    This particular query could return up to 90,000 rows.

    I would have to append a...

    GROUP BY abs_id, absr_id

    HAVING COUNT(*) > 1

    ...to this query in order to trap the dups

    I can try this tomorrow during peak loads for our system.

  • I understand the pitfalls of NOLOCK and the possibilities of dirty reads.

    I am confused why 'SELECT DISTINCT' will not resolve the duplicate rows into one distinct row.

  • So I think the only way may be an IDENTITY column within your destination table to extend the primary key and remove the duplicte values after the insertation.

    Greets

    Flo

  • Whenever I've run into this error, its always been something I missed that there were records already in the table.

    Are you sure there is not a hard table by the same name that was inadvertently created or a mismatch in the owner/schema? Not trying to beat a dead horse, but I've done this myself.

    I'm very interested in the final resolution.

    bc

    [font="Arial Narrow"]bc[/font]

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply