Using Begin Tran for a select statement?

  • Hello All

    I have an overly complex SELECT statement that I am married to at the moment.  The question I had is around some tuning that we discovered over the weekend.  We have some unit tests that were failing at random intervals, but when the executed statement was pulled aside for analysis, it was found to run in subsecond time while it was in isolation.  The select statement itself runs inside a stored procedure.  In desperation to get consistent behavior, one of our developers added a BEGIN TRAN ... END TRAN wrapper around the SELECT.  This seems to have corrected the issue.  At the moment there does not seem to be any volume of traffic that we can throw at this process that will cause it to fail.

    My question is, why would a transaction wrap on a SELECT statement influence the behavior this way.  I realize that this may be highly theoretical, but I would appreciate your thoughts anyway.

    Thanks in advance for your time.

     

    Steve

  • Could this be concurrency somehow? A SELECT should have an implicit transaction during which it runs, so I'm not sure why this changes things.

  • Could it be a bug in the unit test?  What I mean is if the unit test is checking the number of active transactions and sometimes it comes back with 1 sometimes it comes back with 0 and it is checking if it is greater than 0, then putting BEGIN TRAN at the start would solve that problem.

    What is your unit test testing?  What is the "pass" condition and "fail" condition of the unit test?  You said it was failing at random intervals, so the BEGIN TRAN might not have fixed the problem, it may have just reduced the chance of it happening.

    When it "fails", is it the unit test check that fails OR is it the stored procedure gives an error?  If it is an error, what is the error?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Ok, Yes I did not add that detail.

    When it fails it fails because of timeout.  A procedure that normally takes < 1 second suddenly starts taking 30 seconds.  When the "bad sql" is pulled in Isolation, it does indeed run in < 1 sec.

    We have Read Committed Snapshot turned on, so I naively believe that straight select statements should not be blocked.

     

     

    Steve

  • Ok, that is a weird one... I have no good thoughts on why that would be happening nor why adding an explicit transaction would help.

    When you say "randomly failing", is there any predictability with it?  And is there a chance to re-run things yourself to reproduce the "long runs"?  What I am thinking is I'd be curious as to what causes the long runs.  Is it a different execution plan (stale statistics causing a bad plan to be generated for example)?  Or is there some blocking happening but that shouldn't be happening (RCSI should not allow readers to be blocked by writers).

    Is it pulling data across a linked server as those estimates could be bad and you may have blocking on the link?  Or cross database?  Not 100% certain if that would cause a problem or not, but something to look at IF you are doing cross database/linked server data pulls.  I wouldn't expect an explicit transaction to fix those though.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thanks Brian

    Yes this has been a brain drain for me.  The process basically creates a new database from scratch and adds the "demo" data.

    It does this in the same pattern each time so I would expect consistent failures at least.  This has not been the experience.  We do not have anything fancy like linked servers or the like.  Just a simple db create, build version specific schema, apply template data, run behavior tests.

    It is also worth nothing that, even on the latest schema builds, we have not found proof of this occurring in production.

     

    Steve

  • That one I am completely stumped on then.

    Just to be sure, auto-shrink isn't turned on on any database, right?  My thoughts here are that something is growing and shrinking when the stored procedure is run but only sometimes it runs out of space and the auto-grow is set to a really low number (like 1 MB and it needs 1 GB of space) so it is taking forever to grow and then you remove the temporary object so it shrinks then add more temporary objects and it regrows.

    I would expect if this was the case, it would be slow with every run and would be repeatable.  I would also expect an explicit transaction to make no difference, but if that transaction was on the entire stored procedure, it might not do the cleanup and thus shrink until the stored procedure was complete?

    This is all just guessing though.  I could be thinking about autoshrink and autogrow completely wrong here too and I wouldn't be surprised if my guess on this is a big red herring and should be ignored.  I'm just trying to think of things that could cause a query to run slower.  I wouldn't expect shrinking and growing though to add 30+ seconds to a 0 second query though even with tiny autogrowth.

    As a thought, what about on the other side of the coin - the OS.  When it runs slow, do you notice anything different on the OS side like CPU spikes or high disk I/O or network I/O or memory utilization?  I am just thinking MAYBE the OS has some process that is firing that is hogging resources and it isn't a "slowness" on the SQL side.  Again, no clue on why an explicit transaction would fix that, but just tossing some ideas out there.

    I am wondering if the explicit transaction didn't actually fix the problem, just made it much less likely to happen?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thanks for your comments Brian

    I will be doing additional investigation around these suggestions.  My initial thought was just making sure I was not missing anything obvious to more a experienced audience.

     

    Steve

  • Run a trace to capture queryplans etc against the unit:ed database and see why it's sometimes slow and sometimes not. You can script the traces as part of your unit test.

    Or use these new fancy extended events for same purpose.

    It could be that some db housekeeping that is performed when a new database is created. Do you really need to create new db? We do a lot of unit tests and mostly they just drop the data for each run.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply