Odd Behavior by Instance

  • I have multiple SQL Instances Our Developer and Integration Testing run on a virtualized host environment. I have User Acceptance Testing environment that runs on an Active/Passive cluster hosted on a physical server. I have odd behavior for this query (has been altered to not reveal names of objects)

    This works fine in DEV/SIT bit not in UAT

    SELECT

    B.ACCT_ID

    ,C.[ACCT]

    ,C.DISP_ID

    ,'MAT'

    ,C.FSCD

    ,'MATCH'

    ,GETDATE()

    ,C.LAST_NAME

    ,C.FIRST_NME

    ,C.ADDR

    ,GETDATE()

    FROM

    (

    SELECT

    DISTINCT

    A.ACCT AS ACCT_NUM,

    RCVD_DTE,

    COUNT(B.[ACCT]) DispCount

    FROM

    [dbo].TABLEA A

    LEFT JOIN

    TABLEB B

    ON

    A.ACCT = B.ACCT

    WHERE

    B.[FSCD] = 'BIR' AND

    B.[DISP_STTS_CDE] = 'P' AND

    B.[ORG_CDE] = 'LTR' AND

    A.[ACCT] NOT IN (SELECT [ACCT] FROM [dbo].[TABLEB] WHERE [FSCD] IN ('BIQ','BCU'))

    GROUP BY

    A.ACCT,

    RCVD_DTE

    HAVING

    (COUNT(B.[ACCT]) = 1)

    ) A

    INNER JOIN

    [dbo].TABLEA B

    ON

    A.ACCT_NUM = B.[ACCT] AND

    A.[RCVD_DTE] = B.ReceivedDate

    INNER JOIN

    TABLEB C

    ON

    A.ACCT_NUM = C.ACCT AND

    A.RCVD_DTE = C.[RCVD_DTE]

    WHERE

    C.[FSCD] = 'BIR' AND

    C.[DISP_STTS_CDE] = 'P' AND

    C.[ORG_CDE] = 'LTR' AND

    B.[DNA_CODE] = 'DNA' AND

    B.[ReceivedDate] >= @DateChecker

    I did try the NOLOCK hint as well as TRANSACTION ISOLATION LEVEL to no avail. We had to do this for UAT to work for us

    DECLARE @nouse TABLE ([ACCT] varchar(100) PRIMARY KEY)

    INSERT INTO @nouse SELECT [ACCT] FROM [dbo].[TABLEB] WHERE [FSCD] IN ('BIQ','BCU') GROUP BY [ACCT]

    SELECT

    B.ACCT_ID

    ,C.[ACCT]

    ,C.DISP_ID

    ,'MAT'

    ,C.FSCD

    ,'MATCH'

    ,GETDATE()

    ,C.LAST_NAME

    ,C.FIRST_NME

    ,C.ADDR

    ,GETDATE()

    FROM

    (

    SELECT

    DISTINCT

    A.ACCT AS ACCT_NUM,

    RCVD_DTE,

    COUNT(B.[ACCT]) DispCount

    FROM

    [dbo].TABLEA A

    LEFT JOIN

    TABLEB B

    ON

    A.ACCT = B.ACCT

    WHERE

    B.[FSCD] = 'BIR' AND

    B.[DISP_STTS_CDE] = 'P' AND

    B.[ORG_CDE] = 'LTR' AND

    A.[ACCT] NOT IN (SELECT [ACCT] FROM @nouse)

    GROUP BY

    A.ACCT,

    RCVD_DTE

    HAVING

    (COUNT(B.[ACCT]) = 1)

    ) A

    INNER JOIN

    [dbo].TABLEA B

    ON

    A.ACCT_NUM = B.[ACCT] AND

    A.[RCVD_DTE] = B.ReceivedDate

    INNER JOIN

    TABLEB C

    ON

    A.ACCT_NUM = C.ACCT AND

    A.RCVD_DTE = C.[RCVD_DTE]

    WHERE

    C.[FSCD] = 'BIR' AND

    C.[DISP_STTS_CDE] = 'P' AND

    C.[ORG_CDE] = 'LTR' AND

    B.[DNA_CODE] = 'DNA' AND

    B.[ReceivedDate] >= @DateChecker

    Anyone have any thoughts on why this only happens in UAT?

    Also a few other things of note.

    1) I thought maybe due to UAT being larger but here it was smaller (DEV 12 GB, UAT 3 GB).
    2) We noticed this with another database the same issue but this except in that case DEV was 500 GB and UAT was 2 TB and it seems to always be sub query type queries such as a sub query or even use of WITH statements. In each case we found a temporary table variable did solve the issue but no real reason determined it seemed to be this way. And since we were in for performance on 2 TB we didn't think much more about this behavior.

  • Antares686 - Monday, July 24, 2017 1:24 PM

    I have multiple SQL Instances Our Developer and Integration Testing run on a virtualized host environment. I have User Acceptance Testing environment that runs on an Active/Passive cluster hosted on a physical server. I have odd behavior for this query (has been altered to not reveal names of objects)

    This works fine in DEV/SIT bit not in UAT

    SELECT

    B.ACCT_ID

    ,C.[ACCT]

    ,C.DISP_ID

    ,'MAT'

    ,C.FSCD

    ,'MATCH'

    ,GETDATE()

    ,C.LAST_NAME

    ,C.FIRST_NME

    ,C.ADDR

    ,GETDATE()

    FROM

    (

    SELECT

    DISTINCT

    A.ACCT AS ACCT_NUM,

    RCVD_DTE,

    COUNT(B.[ACCT]) DispCount

    FROM

    [dbo].TABLEA A

    LEFT JOIN

    TABLEB B

    ON

    A.ACCT = B.ACCT

    WHERE

    B.[FSCD] = 'BIR' AND

    B.[DISP_STTS_CDE] = 'P' AND

    B.[ORG_CDE] = 'LTR' AND

    A.[ACCT] NOT IN (SELECT [ACCT] FROM [dbo].[TABLEB] WHERE [FSCD] IN ('BIQ','BCU'))

    GROUP BY

    A.ACCT,

    RCVD_DTE

    HAVING

    (COUNT(B.[ACCT]) = 1)

    ) A

    INNER JOIN

    [dbo].TABLEA B

    ON

    A.ACCT_NUM = B.[ACCT] AND

    A.[RCVD_DTE] = B.ReceivedDate

    INNER JOIN

    TABLEB C

    ON

    A.ACCT_NUM = C.ACCT AND

    A.RCVD_DTE = C.[RCVD_DTE]

    WHERE

    C.[FSCD] = 'BIR' AND

    C.[DISP_STTS_CDE] = 'P' AND

    C.[ORG_CDE] = 'LTR' AND

    B.[DNA_CODE] = 'DNA' AND

    B.[ReceivedDate] >= @DateChecker

    I did try the NOLOCK hint as well as TRANSACTION ISOLATION LEVEL to no avail. We had to do this for UAT to work for us

    DECLARE @nouse TABLE ([ACCT] varchar(100) PRIMARY KEY)

    INSERT INTO @nouse SELECT [ACCT] FROM [dbo].[TABLEB] WHERE [FSCD] IN ('BIQ','BCU') GROUP BY [ACCT]

    SELECT

    B.ACCT_ID

    ,C.[ACCT]

    ,C.DISP_ID

    ,'MAT'

    ,C.FSCD

    ,'MATCH'

    ,GETDATE()

    ,C.LAST_NAME

    ,C.FIRST_NME

    ,C.ADDR

    ,GETDATE()

    FROM

    (

    SELECT

    DISTINCT

    A.ACCT AS ACCT_NUM,

    RCVD_DTE,

    COUNT(B.[ACCT]) DispCount

    FROM

    [dbo].TABLEA A

    LEFT JOIN

    TABLEB B

    ON

    A.ACCT = B.ACCT

    WHERE

    B.[FSCD] = 'BIR' AND

    B.[DISP_STTS_CDE] = 'P' AND

    B.[ORG_CDE] = 'LTR' AND

    A.[ACCT] NOT IN (SELECT [ACCT] FROM @nouse)

    GROUP BY

    A.ACCT,

    RCVD_DTE

    HAVING

    (COUNT(B.[ACCT]) = 1)

    ) A

    INNER JOIN

    [dbo].TABLEA B

    ON

    A.ACCT_NUM = B.[ACCT] AND

    A.[RCVD_DTE] = B.ReceivedDate

    INNER JOIN

    TABLEB C

    ON

    A.ACCT_NUM = C.ACCT AND

    A.RCVD_DTE = C.[RCVD_DTE]

    WHERE

    C.[FSCD] = 'BIR' AND

    C.[DISP_STTS_CDE] = 'P' AND

    C.[ORG_CDE] = 'LTR' AND

    B.[DNA_CODE] = 'DNA' AND

    B.[ReceivedDate] >= @DateChecker

    Anyone have any thoughts on why this only happens in UAT?

    Also a few other things of note.

    1) I thought maybe due to UAT being larger but here it was smaller (DEV 12 GB, UAT 3 GB).
    2) We noticed this with another database the same issue but this except in that case DEV was 500 GB and UAT was 2 TB and it seems to always be sub query type queries such as a sub query or even use of WITH statements. In each case we found a temporary table variable did solve the issue but no real reason determined it seemed to be this way. And since we were in for performance on 2 TB we didn't think much more about this behavior.

    You mention 'odd behaviour' and something that isn't working, without (as far as I can see) providing details. Or have I missed something?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Without more information, hard to say.  Since it looks like you can't provide much it may not be worthwhile to ask to see the execution plan or the definitions of the indexes on the tables.  Both of these would be a good place to start your research.  Just a guess, you don't have indexes that maybe beneficial to your queries.  Then again, I could be wrong.

  • Lynn Pettis - Monday, July 24, 2017 2:05 PM

    Without more information, hard to say.  Since it looks like you can't provide much it may not be worthwhile to ask to see the execution plan or the definitions of the indexes on the tables.  Both of these would be a good place to start your research.  Just a guess, you don't have indexes that maybe beneficial to your queries.  Then again, I could be wrong.

    No problem, the issue is the query when moved to our UAT environment from DEV and SIT quits working correctly when using Subqueries. Any idea why this behavior would change. I noted the extra details of DEV/SIT are virtual and UAT is clustered ACTIVE/PASSIVE Cluster. And the idea it could be a size driven issue does not seem to be the case.
    Structures are exactly the same between DEV, SIT and UAT. The query is self blocking as best we can tell but use of NO LOCK and TRANSACTION ISOLATION LEVEL options do not fix it, nor does it show up in blocking list.

  • Still no real detail. 🙂 Unfortunately, to us "quits working correctly" is no better than just saying there's "odd behavior".

    What exactly do you mean?Is the problem that you get incorrect results from the query, that you get some error from the query, or that the query returns correct results but performs less well on some instances than on others?

    From the descriptions you've given, I'm guessing you're talking about a performance problem, but it's all guessing until we know what problem you're trying to describe.

    Cheers!

  • Antares686 - Monday, July 24, 2017 2:20 PM

    Lynn Pettis - Monday, July 24, 2017 2:05 PM

    Without more information, hard to say.  Since it looks like you can't provide much it may not be worthwhile to ask to see the execution plan or the definitions of the indexes on the tables.  Both of these would be a good place to start your research.  Just a guess, you don't have indexes that maybe beneficial to your queries.  Then again, I could be wrong.

    No problem, the issue is the query when moved to our UAT environment from DEV and SIT quits working correctly when using Subqueries. Any idea why this behavior would change. I noted the extra details of DEV/SIT are virtual and UAT is clustered ACTIVE/PASSIVE Cluster. And the idea it could be a size driven issue does not seem to be the case.
    Structures are exactly the same between DEV, SIT and UAT. The query is self blocking as best we can tell but use of NO LOCK and TRANSACTION ISOLATION LEVEL options do not fix it, nor does it show up in blocking list.

    Not a clue.  We can't see what you see.  Could you solve this problem or even provide some kind of insight based solely on what you have posted (ignoring, of course, what you already know since it is your problem).
    As I said, we really need more information.  Have you looked at the execution plans between the different environments?  That is where I would start.

  • Lynn Pettis - Monday, July 24, 2017 2:54 PM

    Not a clue.  We can't see what you see.  Could you solve this problem or even provide some kind of insight based solely on what you have posted (ignoring, of course, what you already know since it is your problem).
    As I said, we really need more information.  Have you looked at the execution plans between the different environments?  That is where I would start.

    There in lies the issue. I gave the info we have and what we know. Unfortunately you cannot get the execution plan at run because it blocks and does nothing, but estimated plan looks the same. We have reindexed and refreshed stats, and this is totally mind boggling.

  • Jacob Wilkins - Monday, July 24, 2017 2:30 PM

    Still no real detail. 🙂 Unfortunately, to us "quits working correctly" is no better than just saying there's "odd behavior".

    What exactly do you mean?Is the problem that you get incorrect results from the query, that you get some error from the query, or that the query returns correct results but performs less well on some instances than on others?

    From the descriptions you've given, I'm guessing you're talking about a performance problem, but it's all guessing until we know what problem you're trying to describe.

    Cheers!

    Well aware and this is why we are stumped by it. DBAs and everyone else, so I was hoping someone would have encountered. It blocks, for no definable reason, there are two code parts in my post, the first is the code for DEV and works in SIT, refuses to work in UAT at all. Until we remove the subquery and replace with a table variable, then all is fine. Not sure how I can explain that blocking is occurring that does not show up in the blocking list and the query which takes about 2 seconds to run normally clocks until the timeout is reached. Don't even see a deadlock victim statement. Literally no definable answer on this we can determine. Guess I might have to resort to getting MS help.

  • Question, can an Account (Acct) have multiple FSCD's and that the following subquery is to exclude all accounts that has an FSCD of 'BIQ' or 'BCU' even if there is an FSCD of 'BIR' for the account?


          AND A.[ACCT] NOT IN
                (
                  SELECT
                    [ACCT]
                  FROM
                    [dbo].[TABLEB]
                  WHERE
                    [FSCD] IN (
                                'BIQ'
                                , 'BCU'
                              )
                )

  • ..the query which takes about 2 seconds to run normally clocks until the timeout is reached.

    Ok, this is what I was looking for.

    A few questions.

    First, from what application are you running the query? I ask because SQL Server itself will let a query run any arbitrary length of time; if you're hitting a timeout then it's an application timeout.

    Second, what evidence have you collected that suggests the issue is some sort of blocking?

    Third, if you run the query from SSMS, and from a separate query window check the wait type, wait time, IO, and CPU for the session, what do you see?

    Are IO and/or CPU steadily increasing, is it sitting with one wait type and a steadily increasing wait time, is it constantly changing wait types with relatively short wait times, etc.?

    Cheers!

  • Jacob Wilkins - Monday, July 24, 2017 3:29 PM

    ..the query which takes about 2 seconds to run normally clocks until the timeout is reached.

    Ok, this is what I was looking for.

    A few questions.

    First, from what application are you running the query? I ask because SQL Server itself will let a query run any arbitrary length of time; if you're hitting a timeout then it's an application timeout.

    Second, what evidence have you collected that suggests the issue is some sort of blocking?

    Third, if you run the query from SSMS, and from a separate query window check the wait type, wait time, IO, and CPU for the session, what do you see?

    Are IO and/or CPU steadily increasing, is it sitting with one wait type and a steadily increasing wait time, is it constantly changing wait types with relatively short wait times, etc.?

    Cheers!

    I agree with Jacob here.  I've seen times where this happens and it turns out the application isn't handling the response from SQL properly.  When I ran profiler on SQL (I know, I should use EE, but profiler is just so easy...), I saw the SQL returned in less than 1 second.  The application just didn't consume the data and said there was a timeout.  If memory serves, it was actually LINQ that was causing our problem.  Our developer changed the way his code worked, and the problem went away.  Although for us, it was that it worked fine in DEV, but when it went to our test system (which was identical from what I could tell), it failed.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Antares686 - Monday, July 24, 2017 2:20 PM

    Lynn Pettis - Monday, July 24, 2017 2:05 PM

    Without more information, hard to say.  Since it looks like you can't provide much it may not be worthwhile to ask to see the execution plan or the definitions of the indexes on the tables.  Both of these would be a good place to start your research.  Just a guess, you don't have indexes that maybe beneficial to your queries.  Then again, I could be wrong.

    No problem, the issue is the query when moved to our UAT environment from DEV and SIT quits working correctly when using Subqueries. Any idea why this behavior would change. I noted the extra details of DEV/SIT are virtual and UAT is clustered ACTIVE/PASSIVE Cluster. And the idea it could be a size driven issue does not seem to be the case.
    Structures are exactly the same between DEV, SIT and UAT. The query is self blocking as best we can tell but use of NO LOCK and TRANSACTION ISOLATION LEVEL options do not fix it, nor does it show up in blocking list.

    Yes.  When was the last time you rebuilt statistics on the slower box?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jacob Wilkins - Monday, July 24, 2017 3:29 PM

    ..the query which takes about 2 seconds to run normally clocks until the timeout is reached.

    Ok, this is what I was looking for.

    A few questions.

    First, from what application are you running the query? I ask because SQL Server itself will let a query run any arbitrary length of time; if you're hitting a timeout then it's an application timeout.

    Second, what evidence have you collected that suggests the issue is some sort of blocking?

    Third, if you run the query from SSMS, and from a separate query window check the wait type, wait time, IO, and CPU for the session, what do you see?

    Are IO and/or CPU steadily increasing, is it sitting with one wait type and a steadily increasing wait time, is it constantly changing wait types with relatively short wait times, etc.?

    Cheers!

    Unfortunately don't have a DBA available right now as ours just retired and they have not assigned us the new one to collect detailed stats. I am waiting for the server stats to load to another app to see if there is any clue during the event times around Memory, CPU and IO. I ran from the application first to find the point it sticks. Then I ran in pieces in SSMS the actual query, narrowing it down specially to the select statement that first stalls and which was posted. It literally just goes to lunch and that's the end unless I stop it, it goes on and on doing nothing.

    All that said I did come across another person reporting a similar incident with specifically the NOT IN clause and a subquery which I will vet today. They discovered that they had to add a NULL check to the subquery to get it to run which I will be checking DEV, SIT and UAT to see if there might be something to that today.

  • Antares686 - Tuesday, July 25, 2017 6:44 AM

    Unfortunately don't have a DBA available right now as ours just retired and they have not assigned us the new one to collect detailed stats. I am waiting for the server stats to load to another app to see if there is any clue during the event times around Memory, CPU and IO. I ran from the application first to find the point it sticks. Then I ran in pieces in SSMS the actual query, narrowing it down specially to the select statement that first stalls and which was posted. It literally just goes to lunch and that's the end unless I stop it, it goes on and on doing nothing.

    All that said I did come across another person reporting a similar incident with specifically the NOT IN clause and a subquery which I will vet today. They discovered that they had to add a NULL check to the subquery to get it to run which I will be checking DEV, SIT and UAT to see if there might be something to that today.

    OK update, so I stripped the code down further to find the point of least code to duplicate


    SELECT DISTINCT A.ACCT AS ACCT_NUM FROM [dbo].TABLEA A LEFT JOIN TABLEB B ON A.ACCT = B.ACCT
    WHERE B.[FSCD] = 'BIR' AND B.[DISP_STTS_CDE] = 'P' AND B.[ORG_CDE] = 'LTR' AND
    A.[ACCT] NOT IN (SELECT [ACCT] FROM [dbo].TABLEB WHERE [FSCD] IN ('BIQ','BCU'))

    It ran for over 20 minutes before I had to release it as it was causing deadlocks on other items. Breaking out the pieces like removing the LEFT JOIN and keep the remaining returns in 16 seconds, removing the DISTINCT and keeping the LEFT JOIN and NOT IN runs just under 1 minute, removing just the NOT IN takes 0 seconds. TABLEA has 627 records, with the LEFT JOIN it becomes 8457 records and the NOT IN produces 299 records itself. In the end it becomes 21 records (base on altered version). Taking the NOT IN and making it a LEFT JOIN with an (NOT IN Table Alias).ACCT IS NULL in the WHERE clause returns in 8 seconds. 

    I did try the NULL checks in the subquery and on other referenced fields to see if that would help and it does not.  

  • Change this:

    AND A.[ACCT] NOT IN
                (
                  SELECT
                    [ACCT]
                  FROM
                    [dbo].[TABLEB]
                  WHERE
                    [FSCD] IN (
                                'BIQ'
                                , 'BCU'
                              )
                )

    to this and see if there is a difference:

    AND NOT EXISTS
                (
                  SELECT
                    [ACCT]
                  FROM
                    [dbo].[TABLEB] [tb]
                  WHERE
                    [tb].[ACCT] = [A].[ACCT] AND
                    [tb].[FSCD] IN (
                                      'BIQ'
                                      , 'BCU'
                                    )
                )

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply