Parallelism and date predicate

  • SQL Server 2005 standard

    table1 has 4 millions recs. The below query returns 480,000 recs. There is non-clustered index on table1.Date1.

    Below is a simple query. This query uses parallelism.

    SELECT count(*)

    FROM table1

    WHERE

    (

    table1.Date1 >= '10/01/2010'

    AND

    table2.Date1 <= '04/30/2011'

    )

    However, when the date range changes to system based dates, the query runs in serial.

    SELECT count(*)

    FROM table1

    WHERE

    (

    table1.Date1 >= CONVERT(DATETIME, '10/01/' + CAST(DATEPART(yyyy, dateadd(yyyy, -2, getdate())) AS CHAR(4)))

    AND

    table2.Date1 <= cast(CONVERT(varchar(10),DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0,dateadd(yyyy, -1, getdate())),0)),101) as datetime))

    )

    I don't understand why the 2nd query is not using parallelism. How does the constant date vs dynamic system based dates in predicate affect the SQL optimizer to choose Serial vs parallel execution plan? In the execution plan of 2nd query, estimated row count from table1 is 1., but why? Is there a way to still get parallel plan for dynamic dates (without using stored procedures)?

    Any help is greatly appreciated.

    Thanks much.

  • Can you provide the criteria for computing the start and end dates for your query? Does it always start with 10/01 of a given year?

  • Not sure if this will help with parallelism, but this is how I would code the query. Also, in SQL Server 2005 the date/time data type can have a time component and that shoul be taken into account. As such you will notice the upper range is looking for t1.Date1 to be less than the beginning of the following month.

    SELECT DATEADD(mm, 9,DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) -2, 0)), DATEADD(yy, -1, DATEADD(mm, DATEDIFF(mm, 0, GETDATE()), 0)); -- Shows what dates will be calculated

    SELECT

    COUNT(*)

    FROM

    table1 t1

    WHERE

    t1.Date1 >= DATEADD(mm, 9,DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) -2, 0)) AND -- 2010-10-01 00:00:00.000

    t1.Date1 < DATEADD(yy, -1, DATEADD(mm, DATEDIFF(mm, 0, GETDATE()), 0)); -- 2011-05-01 00:00:00.000

  • Why is it important for this to be parallel?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Lynn,

    Yes, the start date is always 10/1/yyyy and end date is end of previous month (year - current, previous - based on requirement).

    In my case, I am sure the date column, table1.Date1, always stores date without timestamp. so thought my logic would work. Anyway, thanks for noticing it.

    Matt,

    Good question 🙂

    long answer (sorry for the lengthy explanation ;)) - I have a rather complex query than the one I provided above. It runs for over 10 mins, which I think is longer. The query has only one predicate and it is date predicate as shown above. If I use the actual dates in the predicate (as in query 1 above) the complex query uses parallelism and returns in 3 mins. when I use dynamic dates then the execution plan (from SSMS) goes serial and runs for over 10 mins. Then I thought it could be the combination of other things in the query along with dynamic dates that could be causing the change in execution plan (parallel to serial).

    But I could replicate the change in execution plan (from serial to parallel) with the above simple query by removing all the complexity from the complex query. (note that the table1 in my example is part of complex query.) Hope this makes sense!

    So I want to understand in my example why parallelism is not used with dynamic dates. Am I missing something on the table / database side, could it be statistics not being up-to-date (i'm going to update stats), is it more than just the dynamic vs actual dates?

    I am newbie to performance tuning on SQL server and I have been doing lot of online reading (forums, blogs, by Paul White) past couple of days trying to understand the SQL query optimizer and execution plans.

    Thanks.

  • This is starting to sound like a performance issue. Read the second article I reference below in my signature block. Follow the instructions regarding what needs to be posted and how to do it. Be sure the execution plans are actual not estimated.

  • hmm. I would be cautious about equating parallel with "faster": in many cases the execution plan will be more efficient without the parallelism.

    That said - Have you tried parameterizing the dynamic dates? As in -

    declare @start datetime,

    @end datetime

    select @start = DATEADD(mm, 9,DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) -2, 0)),

    @end =

    DATEADD(yy, -1, DATEADD(mm, DATEDIFF(mm, 0, GETDATE()), 0));

    SELECT

    COUNT(*)

    FROM

    table1 t1

    WHERE

    t1.Date1 >= @start AND t1.Date1 < @end;

    The issue might be around getdate(). The function is considered to be non-deterministic, which would mean that the query is attempting to reevaluate it multiple times over during the execution.

    I would be curious if you saw a difference in the exec plan by using the above.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • your query is returning so much of data, if optimizer is going for parallelism , i don't see any harm in it. additionally exec plan comparison can give you better picture on your doubt.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Lynn,

    I read the article but thought my question was something basic, so didn't follow the instructions. I guess, there is no simple or obvious thing to performance related questions.

    Matt,

    The statistics were updated on Table1 (in my example). Now I see consistent execution plans i.e., Serial plan in both cases (constant and dynamic dates). Attached are the queries (includes Matt's query) and the execution plans.

    I think parameterizing the dates is a good idea.

    Thanks for your time.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply