bug report: select distinct isn't quite always

  • This is an intermittent problem. it doesn't happen every time the particular ASP script runs.

    It appears that the adodb recordset that's processed by the "while !EOF" loop for some reason, sometimes,

    decides identical data is _not_ distinct and will duplicate entries -- not the entire file, but somewhere along the way (this last time at "HODA" rec#632)

    it then continues to the last entry (total recs ~2300)

    Here is an example of the governee recordset select:

    SELECT distinct Claims.*, Clients.[Last] AS Last, Payors.Allowed

    FROM Claims

    INNER JOIN Clients ON Claims.ClientID = Clients.ClientID

    INNER JOIN Payors ON Payors.TransactionID = Claims.TransactionID

    INNER JOIN MCAProcs ON mcaProcs.code=Claims.ServiceID

    where Payors.CarrierCode='CLIENT' and Claims.Status='SEEN'

    and MCAProcs.noninvoiceable=0 and Clients.ClientCompletelyTermed=0

    and cast(Claims.servicedate as datetime) >= cast('1/1/2014' as datetime)

    order by Clients.last,Claims.TransactionID

    All the tables are essentially static at run time. updates may be occurring but not anything that'd explain a massive burp like this.

    I know sql2k is, well, last century. Does anyone know if this was ever an issue addressed by anyone? any service pack? Was it a type of issue ever acknowledged?

    I'm as much curious as frustrated. thanks.

  • Hmmm, looks quite unlikely.

    Are you sure you're getting suplicate values? How did you determine that?

    -- Gianluca Sartori

  • Hi,

    I agree. I have nothing concrete on it being the culprit. Its a best analysis opinion after reviewing the script. I'm setting up the script to take that resulting recordset and preserve a timestamped copy of it in the database. Next time the user reports dups, i'll be able to review the data and confirm.

  • What fields are you using to determine that there is a duplicate?

    The query is using Claims.*. Are all of these fields needed?

    Could another field, like a datetime, be causing the dupes?

    Is you data like this:

    Name User Created_Time

    Value1 Mike 01/01/2015 00:00:00

    Value1 Mike 01/01/2015 10:00:00

    Change your query to this and see if you get dupes.

    SELECT COUNT(*),

    Claims.*, <-- Expand this to the actual fields, not *

    Clients.[Last] AS Last,

    Payors.Allowed

    FROM Claims

    INNER JOIN Clients ON Claims.ClientID = Clients.ClientID

    INNER JOIN Payors ON Payors.TransactionID = Claims.TransactionID

    INNER JOIN MCAProcs ON mcaProcs.code=Claims.ServiceID

    where Payors.CarrierCode='CLIENT' and Claims.Status='SEEN'

    and MCAProcs.noninvoiceable=0 and Clients.ClientCompletelyTermed=0

    and cast(Claims.servicedate as datetime) >= cast('1/1/2014' as datetime)

    GROUP BY Claims.*, <-- Expand this to the actual fields, not *

    Clients.[Last] AS Last,

    Payors.Allowed

    HAVING COUNT(*) > 1

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Have you looked at the possibility that some of the data might contain unprintable characters, like CR, LF, TAB, etc.

    These can make the data look identical to the human eye, but not to a computer.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Ok, this is something I haven't done before.

    Since the problem is intermittent, I can't run the ADODB recordset selection stmt twice. One set may or may not contain the dups.

    I need to _copy_ the recordset as it was selected

    I could loop through the recordset and insert each set entry one by one into the "backupTable"

    but that seems a bit time consuming.

    Can something like this be done?

    ' first get the recordset as usual'

    Set rstClientDB = Server.CreateObject("ADODB.Recordset")

    rstClientDB.CursorLocation = 3

    rstClientDB.open "select * from clients" ,cnn, adOpenKeyset, adLockOptimistic

    ' before processing copy it back into the database under an assumed name to preserve it for analysis later as needed.'

    rstCopyofClientDB.Cursorlocation = 3

    rstCopyofClientDB.open "select * from rstClientDB into backupTable"

    I thought maybe a filescripting object (ala fso.copyfile(source, to)) but that's not quite right either.

    thanks, B

  • If you want a static copy of the data, use adOpenStatic when opening the recordset.

    Your issues might come from modifications done by concurrent sessions.

    -- Gianluca Sartori

  • Is there any chance you might be running up against the halloween problem?

    http://en.wikipedia.org/wiki/Halloween_Problem

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • thanks folks for the ideas. Remember the issue is intermittent. I've run the script 3 or 4 times since yesterday and no dups. but i have bak from when the user reported it. that clearly shows the output, if the recordset has dups.

    I did take to heart the idea of eliminating the claims.* , that is a pretty heavy table and only 5 columns are needed from it.

    Much as I'd like to see a concrete solution (or at least a controlled demonstration of dup'ing) I'm hoping that a little shake-up of the recordset grab

    the process is pretty straight forward.

    grab the charges against a client (the suspected recordset)

    loop via movenext

    1)each iteration inserts the charge under consideration into another db table

    2)each iteration creates a recordset of assigned credits offsetting the charge for the specific charge, these are also inserted into another db table

    3)each iteration creates a recordset of unassigned credits for the client. these are also inserted into another db table

    the insertion of unassigned credits occurs only once and is prevented from occuring more than once by a boolean script var

    1) and 2) are the insertions that show duplication, 3) isn't dup'd because of the set boolean

    (its accounting, not linear algebra;-) )

  • I suspect, if its not a bug (which seems all but assured), then its got something to do with db (updates), but there ar no updates occurring in this script.

    I will take the suggestion to adOpenStatic. just as another "shake up"

  • Is this join correct?

    INNER JOIN MCAProcs ON mcaProcs.code=Claims.ServiceID

  • claims.* , that is a pretty heavy table and only 5 columns are needed from it

    This may have the answer to your problem. The SELECT DISTINCT will work on all of the columns in the table, but you are only showing 5 of these to the end user. It is therefore possible that if two records from Claims differ only by the value in a non-displayed column, then the data displayed for the columns shown will be identical.

    The problem is not with the database, it is doing exactly what you asked it to do. The problem is with the query, you are not asking it to do what you want.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply