Changed behavior or am I missing something?

  • Hi all,

    when porting and testing code from 2008/2012 to 2014 I noticed that some queries where significantly slower. Digging into it, I found that there is what seems to me a different behavior between the versions.

    Well it's been a long day and I might be missing something, (no need to be polite πŸ˜‰ ) but then again, I have written and used this kind of code hundreds of times. To demonstrate, the following code exhausts the NUMBERS CTE, far beyond the length of the character column specified in the filter;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME LIKE N'#MYTEMP_%' AND TABLE_SCHEMA = N'dbo')

    DROP TABLE #MYTEMP

    create table #MYTEMP

    (

    MYTEMP_ID INT IDENTITY(1,1) NOT NULL

    ,MYTEMP_STR varchar(50)

    );

    insert into #MYTEMP(MYTEMP_STR)

    values ('12345ABCD')

    ,('23456BCDE')

    ,('34567CDEF')

    ,('45678DEFG');

    ;WITH NX(N) AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS X(N))

    ,NUMBERS(N) AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS N

    FROM NX N1,NX N2,NX N3 ,NX N4,NX N5,NX N6,NX N7,NX N8

    )

    SELECT

    *

    ,SUBSTRING(MT.MYTEMP_STR,NM.N,1) AS PART

    FROM #MYTEMP MT

    CROSS APPLY NUMBERS NM

    WHERE LEN(MT.MYTEMP_STR) <= NM.N

    and slightly different flavour

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME LIKE N'#MYTEMP_%' AND TABLE_SCHEMA = N'dbo')

    DROP TABLE #MYTEMP

    create table #MYTEMP

    (

    MYTEMP_ID INT IDENTITY(1,1) NOT NULL

    ,MYTEMP_STR varchar(50)

    );

    insert into #MYTEMP(MYTEMP_STR)

    values ('12345ABCD')

    ,('23456BCDE')

    ,('34567CDEF')

    ,('45678DEFG');

    ;WITH NX(N) AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS X(N))

    ,NUMBERS(N) AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS N

    FROM NX N1,NX N2,NX N3 ,NX N4,NX N5,NX N6,NX N7,NX N8

    )

    SELECT

    *

    ,SUBSTRING(MT.MYTEMP_STR,NM.N,1) AS PART

    FROM #MYTEMP MT

    INNER JOIN NUMBERS NM

    ON LEN(MT.MYTEMP_STR) <= NM.N[/code]

    and then again, even worse

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME LIKE N'#MYTEMP_%' AND TABLE_SCHEMA = N'dbo')

    DROP TABLE #MYTEMP

    create table #MYTEMP

    (

    MYTEMP_ID INT IDENTITY(1,1) NOT NULL

    ,MYTEMP_STR varchar(50)

    );

    insert into #MYTEMP(MYTEMP_STR)

    values ('12345ABCD')

    ,('23456BCDE')

    ,('34567CDEF')

    ,('45678DEFG');

    ;WITH NX(N) AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS X(N))

    ,NUMBERS(N) AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS N

    FROM NX N1,NX N2,NX N3 ,NX N4,NX N5,NX N6,NX N7,NX N8

    )

    SELECT

    *

    ,SUBSTRING(MT.MYTEMP_STR,NM.N,1) AS PART

    FROM #MYTEMP MT

    INNER JOIN NUMBERS NM

    ON NM.N <= LEN(MT.MYTEMP_STR)

    Before jumping to any conclusions, am I doing something wrong or has something changed?

    Thanks,

    Eirikur

  • Tons of stuff has changed. First, did you change the compatibility level on your database to 120? If not, you're using the old statistics with the new optimizer in compatibility mode, which might explain a difference in behavior, possibly. If you've changed the compatibility level to 120, then you've got an all new optimizer and a whole new cardinality estimator, which will seriously change the behavior of some plans. That, frankly, is the most likely explanation for many differences. Get the execution plan with the compatibility set to 110 and 120 and compare them.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Eirikur Eiriksson (4/30/2014)


    INNER JOIN NUMBERS NM

    ON LEN(MT.MYTEMP_STR) <= NM.N

    I'm thinking that bit of code cannot possibly be correct even for the old versions.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/30/2014)


    Eirikur Eiriksson (4/30/2014)


    INNER JOIN NUMBERS NM

    ON LEN(MT.MYTEMP_STR) <= NM.N

    I'm thinking that bit of code cannot possibly be correct even for the old versions.

    Thanks Jeff, this I missed!

    πŸ˜€ that is a typo if I ever seen one πŸ˜›

  • Grant Fritchey (4/30/2014)


    Tons of stuff has changed. First, did you change the compatibility level on your database to 120? If not, you're using the old statistics with the new optimizer in compatibility mode, which might explain a difference in behavior, possibly. If you've changed the compatibility level to 120, then you've got an all new optimizer and a whole new cardinality estimator, which will seriously change the behavior of some plans. That, frankly, is the most likely explanation for many differences. Get the execution plan with the compatibility set to 110 and 120 and compare them.

    Thanks Grant, looking further into this and getting even more puzzled.

    The testing has so far been done on an upgraded system, will test on a clean 2014 system later.

    😎

  • I get the same crappy performance on 2012 as 2014 for these...I haven't got a 2008 to test on right now.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (4/30/2014)


    I get the same crappy performance on 2012 as 2014 for these...I haven't got a 2008 to test on right now.

    That means I might be missing something:-P

    Thanks!

  • Or I have a crappy 2012 install?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (4/30/2014)


    Or I have a crappy 2012 install?

    Few weeks back, before upgrading the 2012 system IΒ΄m working on to 2014, I ran into a similar behavior when forgetting the filter/limit on the numbers cte. Adding the filter corrected the problem then.

    Was the changes in the cardinality estimator or guesstimator? Seeing estimated 10 rows, actual rowcount 10000000 rows in the exec plan.;-)

    Changing to 110 compatibility halves the execution time.

  • Make sure you rebuild indexes and update stats too. Every time I move a db from older version of SQL Server to a new version I rebuild the indexes.

  • I'm not sure why Grant and others talk about update statistics, since everything is confined into a single script. There are no old statistics to update.

    There is a new cardinality estimator, but I'm not sure that it matters for this case.

    I ran the first script on SQL 2008 and after changing <= to >=, and it ran fΓΆr two minutes and forty-five seconds. On SQL 2014, the script completed in 2:18, somewhat better.

    I also tried to use Itzik Ben-Gans smart fn_nums:

    CREATE FUNCTION dbo.fn_nums(@n AS bigint) RETURNS TABLE AS

    RETURN

    WITH

    L0 AS(SELECT 1 AS c UNION ALL SELECT 1),

    L1 AS(SELECT 1 AS c FROM L0 AS A, L0 AS B),

    L2 AS(SELECT 1 AS c FROM L1 AS A, L1 AS B),

    L3 AS(SELECT 1 AS c FROM L2 AS A, L2 AS B),

    L4 AS(SELECT 1 AS c FROM L3 AS A, L3 AS B),

    L5 AS(SELECT 1 AS c FROM L4 AS A, L4 AS B),

    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY c) AS n FROM L5)

    SELECT n AS Number FROM Nums WHERE n <= @n;

    but I gave up when my query had not completed in four minutes.

    I think it is better to have a table of numbers. While sometimes the optimizer can use these exploding CTEs correctly, it has little clue of what is going on and it may come up with complete crazy plans.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Firstly, thank you all for looking into this. It was me after all missing something rather obvious, maybe too obvious. The CTE in the code generates a sequence of 10^8 entries which crosses the threshold of noticeable generation time (see table).

    +------------+--------+---------+

    | Row count | CPU ms | Elapsed |

    +------------+--------+---------+

    | 100 | 0 | 0 |

    | 1000 | 0 | 0 |

    | 10000 | 0 | 1 |

    | 100000 | 16 | 17 |

    | 1000000 | 217 | 155 |

    | 10000000 | 1981 | 1566 |

    | 100000000 | 20313 | 15955 |

    | 1000000000 | 194144 | 153251 |

    +------------+--------+---------+

    Testing on 2008 to 2014 produced similar performance and almost identical execution plans with only a slight difference in cost distribution. Erland was spot on, all of the plans exhausted the CTE number generation.

    I believe that Itzik Ben-Gans smart fn_nums is more geared towards populating number tables than for inline usage such as this. He has another version which includes a TOP/OOF clause. In fact, without a TOP/OOF clause, all tests had the same problem, regardless of the maximum number or the presence of a where / filtering clause outside the CTE.

    Fixing this in the code was simply a case of changing the second CTE.

    😎

    ;WITH NX(N) AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS X(N))

    SELECT

    MT.MYTEMP_ID

    ,NM.N

    ,SUBSTRING(MT.MYTEMP_STR,NM.N,1) AS STR_PART

    FROM #MYTEMP MT

    CROSS APPLY

    (

    /* Tally moved here for using the len() to limit the rowcount */

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS N

    FROM NX N1,NX N2,NX N3 ,NX N4,NX N5,NX N6,NX N7,NX N8,NX N9 -- (10^9)

    ORDER BY 1 OFFSET 0 ROWS FETCH FIRST (LEN(MT.MYTEMP_STR)) ROWS ONLY

    ) AS NM;

  • It's interesting that you used fetch first N instead of TOP(N).

    Was there a performance improvement or just personal preference?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (5/4/2014)


    It's interesting that you used fetch first N instead of TOP(N).

    Was there a performance improvement or just personal preference?

    It is more of a preference, kind of a reminder when I'm working on 2012/2014:-D

    The server introduces the same TOP operator for both, haven't seen any difference in performance.

    😎

  • What method did you use to port the database?

    Remember to use a schema comparison tool to insure things like indexes, table options, statistics, constraints, plan guides, etc. are all the same before attempting a baseline comparison. Also the method used to copy table data to the new instance (if that's the case) could have resulted in fragmentation.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Viewing 15 posts - 1 through 15 (of 15 total)

    You must be logged in to reply to this topic. Login to reply