Stored Procedure - unexpected execution plan

  • Scenario:

    1) Table contains over 100 million records.

    2) Store procedure has 1 input parameter @BatchID int and returns aggregated data from BigTable where BatchID = @BatchID

     

    Problem:

    The stored procedure doesnt use a parallel execution plan.   If i run the stored procedure's select statement in query analyzer it returns in 1/5th the time and uses a parallel plan.  The only way i could get the stored procedure to use the same plan was to declare a variable inside the procedure and set it equal to the input parameter (@batchid).

     

    Before Code:

    Create Procedure Test1 (@batchid int)

    as

    Select * from BigTable where batchid = @batchid

     

    After Code:

    Create Procedure Test2 (@batchid int)

    As

    Declare @batchid2 int

    set @batchid2 = @batchid

    Select * from BigTable where batchid = @batchid2

     

    I hope someone can explain this behavior because it has all of us here stumped.  Is it a bug or just the way the query optimizer treats input parameters?  This really caused a huge difference in query performance.  The Before Code took over 2 hours to process 4.5 million records versus the After Code which took 11 minutes.  As you can see this can be a serious problem if we dont understand the root cause.

  • Sql server caches the plan based on the first set of input parameters you send.  If you sent an unlikely set of parameters, then the server may have saved an optimized query plan like it did in this case.  This feature is called parameter sniffing (saves the server lots of compile time). However when you transfer the input parameter to a local variable, the server can no longer use that technic, so the plan must be recompiled (maybe not entirely) to make sure it has the best access path.

  • Dropping and recreating the proc helps to remove any previous query plans from the proc. Generally its a good practice to drop/recreate as

    IF Exists (SELECT * FROM Sys.objects Where Name ='...' AND type = 'p')

    DROP PROC ...

    GO

    CREATE PROC ..

    ...

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • Is this technic any better than simply using sp_recompile?

  • >>Dropping and recreating the proc helps to remove any previous query plans from the proc. Generally its a good practice to drop/recreate as

    >>Is this technic any better than simply using sp_recompile?

    I have one better:  use the WITH RECOMPILE option of the CREATE PROC statement.  NOTE:  if the sproc is very complicated and gets called frequently, there will be some CPU overhead for every recompile that could slow the system a tad.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I guess i should've included the steps we took to identify the issue.  I first noticed the problem when the tables became very large, so I assumed SQL Server cached an old plan, so i recompiled the stored procedure.  Unfortunately, the recompile didnt change the query plan, nor did a drop/create.  Like I said, we found the query behaves complete different outside of the stored procedure.  It runs in 1/5th the time and hits all 8 processors.  The only way we could get the Stored Procedure to perform as well, was to add a variable and set it to the input parameter...  We are still stumped...

  • 1) When you dropped/recreated the sproc (without the variable kludge), did you execute it immediately with an appropriate batchid BEFORE executing the SELECT statement in isolation? 

    2) If possible (be VERY careful with this) can you execute dbcc freeproccache to flush ALL query plans from the cache and then execute the sproc?

    3) Using Show Estimated Query plan, does the sproc show a parallell plan?  How many rows does it estimate pulling from the main table?  How does this compare to the select by itself?  Also, are you executing the select with a hard-coded value or a variable?

    4) What ARE the query plans chosen by both mechanisms?  Is it just the parallelization that is different?

    5) Is there an index on BatchID?  It may not matter since you are chewing through 4.5M records of a 100M row table (for the execution you mentioned).  However, for other values of BatchID that reference fewer values an index could be very helpful in reducing I/O.

    6) Are you receiving any blocking during execution?  Consider NOLOCKing the table if you aren't worried about dirty reads.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I just recognized that you have a select * in the query.  Do you REALLY bring back 4.5MILLION records to the front end??  And EVERY COLUMN for those 4.5M records?  I find it hard to believe that can be even REMOTELY useful!  :-))  Perhaps you are just filling up your box'es RAM with QA/SSMS results? 

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I apologize for not making the example clearer.  Here's the low down on the process.

    The purpose of the stored proc is to Process / Aggregate approximately 4 million transaction header records and 20+ million transaction detail records into the appropriate accounting buckets ( ie; Cash, Checks, Discover, etc...).  I used Select * to just fill in the example.  The actual query is a somewhat complex Case Statement.  The biggest difference i see in the execution plan is the use of parallelism.  For some reason the Stored Procedure does not use a parallel execution plan if I dont set a variable equal to the input parameter.  I can reproduce this behavior by adding or removing the Variable = Input Parameter line.

  • I am still struggling to envision a reason for this behavior (as I am sure you are)!

    1) was the use of parallelism the ONLY difference in the estimated (or better yet, actual) query plan or just the BIGGEST one?  The answer could be important.  You just mentioned a header/detail situation so you are doing at LEAST one join somewhere I bet and you have yet to mention that in your posts.  Without further knowledge, I will venture a guess that the sproc is doing a nested loop but your direct query is doing something much more efficient like a hash or even merge join.  Can you post at least the entire FROM clause of the guts of the sproc? 

    2) EVERY READER TAKE NOTE OF THIS:  When you are requesting help on performance issues -PLEASE DON'T SIMPLIFY!!  🙂  If you can't post the entire code that is problematic, at least give everything from the FROM clause down.  Include at least the number of rows in all tables referenced - preferably with key field and index information at a minimum.

    3) What is your server's cost threshold for parallelism setting?

    4) What are all of the settings you have for SSMS (or whatever execution environment you are using)?  Do you use the same mechanism to actually execute the sproc?

    5) Are your statistics up to date?  Consider updating them and recompiling after that.

    6) Please answer questions 3 and 4 in my previous post fully.  I really do need to know EVERY difference in the query plans to help you.

    I have some tweaks you can do if the above information doesn't allow me to get he sproc working correctly

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I agree, we should have the possibility to post query plans here. But then again we should also know the indexes, datamodel, queries, even the whole analysis. It is just nor realistic.

    This doesn't explain the different query plan, but there are optimizer hits you could use. MAXDOP will set you the maximum number of processors to use and override the server settings. You could also save the good query plan as XML and use the option (USE PLAN N').

    The BIG disadvantage of this is that your query plan gets hard-coded. So if something changes like data distribution or indexes; your plan cannot automatically adapt.

    Just looking at your code, I guess it has something to do with selectivity. The first time test1 is called it is compiled with the real value passed to it and its selectivity is determined by the statistics. If this is not very selective the cost of the plan will be considered high and paralellism will be considered. In procedure test2 the selectivity of @batchid will be unknown anyway and a default rule is used. Apparently this will not trigger parallelism. So I would take a look at the selectivity, update the statistics and like mentioned here before use WITH RECOMPILE.

    Hope this helps,

    Jan

  • It looks like the problem is gone now.  I can not reproduce the issue to generate an execution plan.  Sorry to have bothered everyone.  Next time, I'll be sure to at least include the from clause and important pieces of information from the execution plans.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply