Optimizing Stored Procedures that utilize only local variables (and lots of them)

  • Our proprietary reporting module calls stored procedures that have no parameters but make extensive use of local variables.

    A single query (and many of them are dynamic SQL) may utilize many local variables in the WHERE clause. And some of the stored procs are very complex, containing more than 2000 lines of code.

    I'm just beginning to learn the importance of accurate cardinality estimates and am wondering if anyone can share how they might enhance optimization of procs that make extensive use of local variables that can affect the size of result set.

    For example, would you have the procs set to recompile each time they are run? Would you make use of "OPTIMIZE FOR.." etc.

    I hope this makes sense. Please don't yell at me for not yet posting a code sample. I will be glad to do that when asked. I just want to be sure that whatever I put up is useful.

  • Where local variables get their values from?

    You should probably post some code. (Pick the simplest of your procedures).

    It would be hard to recommend (or not recommend) the use of Optimize or Recompile options. Most people would probably say “it depends” and you’ll have to experiment with those in your environment.

    --Vadim R.

  • jshahan (8/21/2012)


    For example, would you have the procs set to recompile each time they are run? Would you make use of "OPTIMIZE FOR.." etc.

    Not by default I wouldn't. I'd look at the procedures (starting with the ones that perform worst) and see if there are any cardinality errors. If there are, I might go for recompile, optimise for or splitting the procedure into sub-procedures. No blanket rule.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • jshahan (8/21/2012)


    For example, would you have the procs set to recompile each time they are run? Would you make use of "OPTIMIZE FOR.." etc.

    Using WITH RECOMPILE on the procedure is rarely the best option; it causes every query inside the procedure to compile at least once. OPTION (RECOMPILE) is a much more targeted weapon - it recompiles only the query it is attached to. This option allows SQL Server 2008 to 'see' the values in the local variables at the time, and produce a plan optimized for that value. The downside is that compilation is not free, and you might see a significant increase in CPU and memory if the queries are complex and/or called very frequently. OPTIMIZE FOR can be useful, if you can identify a suitable value to optimize for. It can be tough to choose such a value, especially one that will remain a good choice for all time as the data changes. As Gail said, I would look to analyse procedures and queries that perform unacceptably now, and use the tool that best suits the individual case.

  • You make some good points, Joe, but I fear the "proprietary module" allows for limited rewrite & redesign opportunities.

  • -- create a temporary stored procedure

    create procedure #sp_TempReportMaster AS

    create table #iFromDate (iFromDate DateTime)

    insert into #iFromDate values ('JAN 1 2012')

    create table #iThruDate (iThruDate DateTime)

    insert into #iThruDate values ('JAN 31 2012')

    create table #iProdSort1 (ID int)

    insert into #iProdSort1 values (6537)

    exec Rpt_TestProcedure

    -- the temporary proc to calls the stored procedure

    -- and makes the values available for the local variables

    CREATE PROCEDURE [dbo].[Rpt_TestPRocedure]

    AS

    Declare @FromDate datetime,

    @ThruDate datetime,

    @Sort1 nvarchar(4000)

    set @FromDate = (select iFromDate from #iFromDate)

    set @ThruDate = (select iThruDate from #iThruDate)

    set @Sort1 = (select Query from ReportOptionsQuery where OptionCode =

    (select OptionCode from ReportOptions where id =

    (select id from #iProdSort1)))

    Do you all know how wonderful it is to know that you’re getting some of the best advice available on the planet regarding a complex subject that you are just beginning to learn? It’s like I asked how to play an E chord and got an overnight response from Clapton and Santana. I thank you all.

    In sequence of your kind responses:

    rVadim

    The local variables are usually passed a value by a temporary stored procedure created by the report module. Example provided above. The temporary proc is assembled dynamically and is extremely versatile. Sometimes dozens of parameters are passed.

    Gail and Paul

    Your blogs/columns/forum responses are a big part of my attempt to enhance my skills and are really helping me now. I’m trying to encourage our other developers to make use of them.

    I think to paraphrase each of your responses, you are advising against a sledgehammer approach in favor of nuanced troubleshooting that identifies precisely where problems actually exist and addressing them appropriately. I need to find out if I can use multiple values for OPTIMIZE FOR should I decide to try it but I’ll check documentation for that.

    Joe

    I’m sure you know that you are equally famous for your knowledge as you are for your beat downs. I wear my ignorance on my sleeve in these posts and I hope you don’t find cause for blasting me for my questions.

    I didn’t realize I was further revealing my ignorance by mentioning our use of dynamic SQL. I don’t know what is meant by the use of the term “cohesion” in this context. If you have an additional moment to educate me, I’d appreciate it.

    And as an FYI, the background of the primary developer of the product was Paradox.

    And Paul, you are dead on. No way a rewrite is happening at this point in time.

    John

  • jshahan (8/22/2012)


    I think to paraphrase each of your responses, you are advising against a sledgehammer approach in favor of nuanced troubleshooting that identifies precisely where problems actually exist and addressing them appropriately.

    Yup. I'm not a fan of what I call shotgun query tuning. Find the problems, address the problems and be specific with the solutions.

    I need to find out if I can specific multiple values for OPTIMIZE FOR should I decide to try it but I’ll check documentation for that.

    Not in a single query, no.

    I didn’t realize I was further revealing my ignorance by mentioning our use of dynamic SQL

    Nothing wrong with dynamic SQL used appropriately. It's a tool. Whether you're using it appropriately or not is hard to tell with this little information.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • set @SqlString3 = N'

    left join ClientProgram cpgm on cpgm.EnrollmentKey = ClientEnrollment.ID

    and '''+convert(char,@ThruDate,101)+''' between

    isnull(cpgm.effectivedate,'''+convert(char,@FromDate,101)+''') and

    isnull(cpgm.expirationdate,'''+convert(char,@ThruDate,101)+''')

    left join Program pgm on pgm.id = cpgm.ProgramKey

    left join Program spgm on spgm.id = cpgm.SubProgramKey

    where dbo.ClientEnrollment.Void is null and dbo.ClientEnrollment. Enrollment_Date between

    '''+convert(char(10),@FromDate,101)+'''and '''+convert(char(10),@ThruDate,101)+''''

    if @ProviderCount >0

    set @SqlString3 = @SqlString3 + N' and dbo.Enrollment.ProviderKey in (select id from #Provider)'

    if @SiteCount >0

    set @SqlString3 = @SqlString3 + N' and dbo.Enrollment.ProviderKey in (select ProviderKey from ProviderSites where ProviderSites.ProviderSiteKey in (select ID from #Site))'

    if @RBHACount >0

    set @SqlString3 = @SqlString3 + N' and dbo.RBHA.ID in (select ID from #RBHA)'

    if @RBHAGroupCount >0

    set @SqlString3 = @SqlString3 + N' and dbo.RBHA.ID in (select RbhaKey from RBHAXrefGroup where RBHAXrefGroup.GroupKey in (select ID from #RBHAGroup))'

    if @EnrollType = 'C'

    set @SqlString3 = @SqlString3 + N' and dbo.ClientEnrollment.CrisisEnrollment = 1'

    if @EnrollType = 'R'

    set @SqlString3 = @SqlString3 + N' and (dbo.ClientEnrollment.CrisisEnrollment = 0

    or dbo.ClientEnrollment.CrisisEnrollment is null)'

    Above is an example of how we typically use dynamic SQL. I'm not an authority on injection but it seems pretty airtight to me.

  • jshahan (8/22/2012)


    if @ProviderCount >0

    set @SqlString3 = @SqlString3 + N' and dbo.Enrollment.ProviderKey in (select id from #Provider)'

    Above is an example of how we typically use dynamic SQL. I'm not an authority on injection but it seems pretty airtight to me.

    That's generally a bad way to handle it. IN can be a very expensive operator.

    You're very close to a catch-all type of thing here... if you need to solve the problem dynamically, I'd consider building your SELECT, FROM and WHERE clauses (and potentially GROUP BY/HAVING) as separate SQL variables.

    Then concatenate those pieces back together.

    Gail has a great blog on this: http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/

    The above could instead be constructed as a JOIN then, which generally performs better than using the IN operator.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • where dbo.ClientEnrollment.Void is null and dbo.ClientEnrollment. Enrollment_Date between

    '''+convert(char(10),@FromDate,101)+''' and '''+convert(char(10),@ThruDate,101)+''''

    Nope, this is injectable. Look up sp_executeSQL, and in particular parameterization. Any time you're directly including a variable into the SQL Execute string without a parameter 'wash' of some kind you're open to injection.

    Our proprietary reporting module calls stored procedures that have no parameters but make extensive use of local variables.

    A single query (and many of them are dynamic SQL) may utilize many local variables in the WHERE clause. And some of the stored procs are very complex, containing more than 2000 lines of code.

    How the heck do they use local variables that heavily without them being proc parameters, or am I misunderstanding this a little?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • mtassin (8/22/2012)


    The above could instead be constructed as a JOIN then, which generally performs better than using the IN operator.

    No it doesn't.

    http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/

    General conclusion: In is a very slight bit more optimal than join for the purpose of just checking if the rows match (SQL can use a semi-join not a full join)

    In is not an inefficient operator. I suspect, given the prevalence of that thought that it used to be at some point in SQL's history.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Evil Kraig F (8/22/2012)


    where dbo.ClientEnrollment.Void is null and dbo.ClientEnrollment. Enrollment_Date between

    '''+convert(char(10),@FromDate,101)+''' and '''+convert(char(10),@ThruDate,101)+''''

    Nope, this is injectable.

    Depends. Very very hard to inject anything with 10 characters to play with (the convert to char 10), and if that variable's defined as a datetime it's possible for it to contain rogue input. Also, can't see from there where @FromDate comes from. It could be a calculation, from a table or user input.

    The general form is certainly injectable (variables concatenated into a string), the specific example, can't say.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (8/22/2012)


    mtassin (8/22/2012)


    The above could instead be constructed as a JOIN then, which generally performs better than using the IN operator.

    No it doesn't.

    http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/

    General conclusion: In is a very slight bit more optimal than join for the purpose of just checking if the rows match (SQL can use a semi-join not a full join)

    In is not an inefficient operator. I suspect, given the prevalence of that thought that it used to be at some point in SQL's history.

    Thanks for the Article Gail

    I do recall back with SQL 7/2000 that IN was very expensive. Figures that at some point MS would make it better. I recall watching IN get blown up into a chained set of OR's. I would see it bypass indexes and go for table scans and a bunch of other junk.

    I also enjoyed reading the comparison of NOT EXISTS vs JOIN, did I miss one that compared EXISTS/NOT EXISTS vs IN/NOT IN?



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • mtassin (8/22/2012)


    I do recall back with SQL 7/2000 that IN was very expensive. Figures that at some point MS would make it better. I recall watching IN get blown up into a chained set of OR's.

    That's IN with a list of values. Not IN with a subquery. Very different behaviour. SQL 2000 had problems with OR statements, so that translated over to IN with values lists.

    I would see it bypass indexes and go for table scans and a bunch of other junk.

    It will go for scans if there are too many values in the list to make multiple seeks an option (just like with key lookups vs table scan)

    Consider this:

    SELECT <stuff> FROM SomeTable WHERE SomeColumn IN (<List of 1000 values>)

    Now let's say the table has 1500 pages and SomeColumn is indexed (and it's even a covering index). Which will be more efficient, a single table scan that reads 1500 pages, or 1000 nonclustered index seeks that each read 2 pages?

    Table scan != inefficient. That's another of those irritating performance myths.

    did I miss one that compared EXISTS/NOT EXISTS vs IN/NOT IN?

    How would I compare things that do the complete opposite to each other?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (8/22/2012)


    did I miss one that compared EXISTS/NOT EXISTS vs IN/NOT IN?

    How would I compare things that do the complete opposite to each other?

    EXISTS vs IN

    NOT EXISTS vs NOT IN

    Not exactly opposite, and I've seen them used interchangeably to do the same thing.

    i.e.

    back to this code

    if @ProviderCount >0

    set @SqlString3 = @SqlString3 + N' and dbo.Enrollment.ProviderKey in (select id from #Provider)'

    This could be written this way... (I realize one is dynamic, the other isn't... ignoring that fact I'm mostly talking about the dbo.enrollment.providerkey IN (select id from #Provider)

    I'm pretending dbo.enrollment is aliased with a in this case and not making it a dynamic fragment... but I hope my question makes sense.

    WHERE

    EXISTS (select 1 from #Provider x where a.ProviderKey = x.id)

    It's effectively very similar to the IN, but I've seen differences in performance between them myself. Maybe I'm just imagining things?



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply