Get Set For Speed

  • Tom Staab

    Mr or Mrs. 500

    Points: 507

    Comments posted to this topic are about the item Get Set For Speed

  • Eirikur Eiriksson

    SSC Guru

    Points: 182425

    Thank you Tom for this piece!

    😎

    After reading it I do have few concerns about the findings:

    1. The execution times suggest that a covering index for the aggregation is in place, can you verify that?

    2. Different units of work are within each timing section of the code, both the cursor and the while loop section include the truncation of the temporary table whilst the set section doesn't. This is bound to seriously affect the outcome.

    3. Using a temporary table to "bucket" the outcome is fine when verifying the results but the method shouldn't be used when timing the execution as file growth events etc. could alter the outcome.

    4. As all the queries are reading and aggregating exactly the same data, statistics io are not adding anything to the results.

    5. When timing cursor or while loop executions the output of the statistics will affect the results as those will be outputted for each execution within the cursor or the while loop.

    6. Using statistics time is not the most accurate method of measuring execution time and can in fact affect the results, strongly suggest using other alternatives.

    Having corrected the discrepancies within each section making certain that only equal amount of work is being measured the findings are quite different:

    Style ElapsedMilliseconds

    cursor 127

    while loop 108

    set 148

    I rewrote the set query excluding the formatting of the month and it's use in the grouping which is a serious flaw in the code:

    Style ElapsedMilliseconds

    set EE 54

    Adjusting the test harness by implementing a different timing method and eliminating the temp table insert, the results are drastically different:

    T_TEXT DURATION

    ------------- -----------

    CURSOR 23002

    WHILE LOOP 24001

    DRY RUN 27001

    SET EE 50003

    SET 145008

    Code to generate the test data

    USE tempdb;

    GO

    /* View wrapper for using the NEWID() function

    within a table value function

    */

    IF OBJECT_ID(N'dbo.VNEWID') IS NULL

    BEGIN

    DECLARE @CREATE_VIEW NVARCHAR(MAX) = N'

    CREATE VIEW dbo.VNEWID

    AS

    SELECT NEWID() AS NID;

    '

    EXEC (@CREATE_VIEW);

    END

    /* Test set generator, inspired by Lynn Pettis's random

    string function

    */

    IF OBJECT_ID(N'dbo.ITVFN_DO_SHAKESPEARE') IS NULL

    BEGIN

    DECLARE @CREATE_FUNCTION NVARCHAR(MAX) = N'

    /*

    Sample text set generator, having an infinite number of code

    monkeys calling this function for infinite number of times

    explains the name 😉

    2015-01-18

    Eirikur Eiriksson

    */

    CREATE FUNCTION dbo.ITVFN_DO_SHAKESPEARE

    (

    @BASE_LENGTH INT

    ,@BASE_VARIANCE INT

    ,@WORD_LENGTH INT

    ,@WORD_VARIANCE INT

    ,@ROWCOUNT INT

    ,@DELIMITER CHAR(1)

    )

    RETURNS TABLE

    AS

    RETURN

    WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

    ,NUMS(N) AS (SELECT TOP (@BASE_LENGTH + @BASE_VARIANCE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL) DESC) AS N FROM T T1,T T2,T T3,T T4)

    ,RNUM(R) AS (SELECT TOP (@ROWCOUNT) ROW_NUMBER() OVER (ORDER BY (SELECT NULL) DESC) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)

    SELECT

    RN.R

    ,((SELECT TOP(@BASE_LENGTH + ((SELECT CHECKSUM(NID) FROM dbo.VNEWID) % @BASE_VARIANCE))

    CASE

    WHEN (NM.N + RN.R + (CHECKSUM(X.NID) % @WORD_LENGTH)) % @WORD_VARIANCE = 0 THEN @DELIMITER

    ELSE CHAR(65 + (ABS(CHECKSUM(X.NID)) % 26))

    END

    FROM NUMS NM

    CROSS APPLY dbo.VNEWID X

    FOR XML PATH(''''), TYPE).value(''.[1]'',''VARCHAR(8000)'')) AS RND_TXT

    FROM RNUM RN;

    ';

    EXEC (@CREATE_FUNCTION);

    END

    /* Test set parameters

    Most parameters are proportional so only the

    @SAMPLE_SIZE (cardinality) has to be set.

    */

    SET NOCOUNT ON;

    DECLARE @SAMPLE_SIZE INT = 1000000; -- Number of "Transactions"

    DECLARE @OUTLET_COUNT INT = @SAMPLE_SIZE / 200; -- Number of "Outlets"

    DECLARE @BASE_DATE DATE = CONVERT(DATE,'2014-01-01',126); -- Base Date, all dates are based on this.

    DECLARE @ZERO_DATE DATE = CONVERT(DATE,'1900-01-01',126); -- Monday 1st. January 1900.

    DECLARE @DATE_RANGE INT = 1096; -- +/- 3 Years

    DECLARE @MAX_PAY_DAYS INT = 90; -- Pay by date offset

    DECLARE @MAX_ITEMS INT = 20; -- Maximum number of Items

    DECLARE @ACT_PAY_DAYS INT = 99; -- "Actual" Pay Date

    DECLARE @AVG_PER_GROUP INT = 500; -- Additional Group Identifier Parameter

    DECLARE @GROUP_COUNT INT = @SAMPLE_SIZE / @AVG_PER_GROUP; -- Number of Groups

    DECLARE @CUSTOMER_COUNT INT = @SAMPLE_SIZE / 4; -- Number of Customers

    /* Random text generation for "customer details" */

    DECLARE @BASE_LENGTH INT = 50 ;

    DECLARE @BASE_VARIANCE INT = 49 ;

    DECLARE @WORD_LENGTH INT = 7 ;

    DECLARE @WORD_VARIANCE INT = 6 ;

    /* Get few nulls in the detail column be having slightly fewer

    entries than possible customer_id

    */

    DECLARE @ROWCOUNT INT = @CUSTOMER_COUNT - 100 ;

    DECLARE @DELIMITER CHAR(1) = CHAR(32);

    /* "customer details" */

    IF OBJECT_ID(N'dbo.TBL_SAMPLE_STRING') IS NOT NULL DROP TABLE dbo.TBL_SAMPLE_STRING;

    CREATE TABLE dbo.TBL_SAMPLE_STRING

    (

    SST_ID INT NOT NULL CONSTRAINT PK_DBO_TBL_SAMPLE_STRING_SST_ID PRIMARY KEY CLUSTERED

    ,SST_VALUE VARCHAR(500) NOT NULL

    );

    /* Create "Customer Details" */

    INSERT INTO dbo.TBL_SAMPLE_STRING (SST_ID, SST_VALUE)

    SELECT

    X.R

    ,X.RND_TXT

    FROM dbo.ITVFN_DO_SHAKESPEARE(@BASE_LENGTH,@BASE_VARIANCE,@WORD_LENGTH,@WORD_VARIANCE,@ROWCOUNT,@DELIMITER) AS X;

    /* Drop the dbo.TBL_SAMPLE_TRANSACTION test set table rather

    than

    */

    IF OBJECT_ID(N'dbo.TBL_SAMPLE_TRANSACTION') IS NOT NULL DROP TABLE dbo.TBL_SAMPLE_TRANSACTION;

    /* Inline Tally Table

    20^7 = 1,280,000,000 Max

    */

    ;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)

    ,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7)

    ,SAMPLE_DATA AS

    (

    SELECT

    NM.N AS TRAN_ID

    ,DATEADD(DAY,CHECKSUM(NEWID()) % @DATE_RANGE,@BASE_DATE) AS TRAN_DATE

    ,(ABS(CHECKSUM(NEWID())) % @OUTLET_COUNT) + 1 AS OUTLET_ID

    ,(ABS(CHECKSUM(NEWID())) % @GROUP_COUNT) + 1 AS GROUP_ID

    ,(ABS(CHECKSUM(NEWID())) % @CUSTOMER_COUNT) + 1 AS CUSTOMER_ID

    ,(ABS(CHECKSUM(NEWID())) % @AVG_PER_GROUP) + 1 AS DETAIL_ID

    ,CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 ))

    --+ CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 )) -- uncomment this line to make the entry "almost" unique

    + CHAR(45) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 10,0)

    + CHAR(58) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 100,0) AS PROD_NO

    ,CONVERT(NUMERIC(12,2),SQRT(ABS(CHECKSUM(NEWID())) + 2),0) AS TOTAL_AMOUNT

    ,(ABS(CHECKSUM(NEWID())) % @MAX_PAY_DAYS) + 1 AS PAY_BY_DAYS

    ,(ABS(CHECKSUM(NEWID())) % @ACT_PAY_DAYS) + 1 AS ACT_PAY_DAYS

    ,(ABS(CHECKSUM(NEWID())) % @MAX_ITEMS) + 1 AS ITEM_COUNT

    --,ASCII(':')

    FROM NUMS NM

    )

    SELECT

    ISNULL(SD.TRAN_ID,1) AS TRAN_ID

    ,SD.TRAN_DATE AS TRAN_DATE

    ,ISNULL((DATEDIFF(DAY,@ZERO_DATE,SD.TRAN_DATE) % 7) + 1 ,0) AS WEEK_DAY

    ,DATEADD(DAY,SD.PAY_BY_DAYS,SD.TRAN_DATE) AS PAY_BY_DATE

    ,DATEADD(DAY,SD.ACT_PAY_DAYS,SD.TRAN_DATE) AS ACT_PAY_DATE

    ,DATEADD(DAY

    ,FLOOR((SD.PAY_BY_DAYS + SD.ACT_PAY_DAYS) / 2)

    ,SD.TRAN_DATE) AS DELIVERY_DATE

    ,CHAR(65 + ( SD.OUTLET_ID % 26 ))

    + CHAR(65 + ( SD.OUTLET_ID % 20 )) AS LOCATION_CODE

    ,ISNULL(CHAR(65 + ( SD.ACT_PAY_DAYS % 26 ))

    + CHAR(65 + ( SD.ITEM_COUNT % 20 ))

    + RIGHT(CONVERT(VARCHAR(8),1000000

    + (SD.ACT_PAY_DAYS * SD.ITEM_COUNT),0),6),'ZZ999999') AS EMP_ID

    ,ISNULL(SD.OUTLET_ID ,1) AS OUTLET_ID

    ,ISNULL(CONVERT(

    TINYINT,1 - SIGN(3 - (SD.OUTLET_ID & 0x03)),0) ,1) AS IS_ONLINE

    ,ISNULL(CONVERT(

    TINYINT,1 - SIGN(7 - (SD.OUTLET_ID & 0x07)),0) ,1) AS IS_PICKUP

    ,ISNULL(SD.CUSTOMER_ID ,1) AS CUSTOMER_ID

    ,ISNULL(SD.GROUP_ID ,1) AS GROUP_ID

    ,ISNULL(SD.DETAIL_ID ,1) AS DETAIL_ID

    ,ISNULL(SD.PROD_NO,'ZZ-9:99') AS PROD_NO

    ,ISNULL(SD.TOTAL_AMOUNT,99.99) AS TOTAL_AMOUNT

    ,ISNULL(SD.ITEM_COUNT,1) AS ITEM_COUNT

    ,ISNULL(CONVERT(NUMERIC(9,2),(0.1 * SD.TOTAL_AMOUNT),0),0) AS TAX_AMOUNT

    ,ISNULL(CONVERT(NUMERIC(9,2)

    ,(0.9 * SD.TOTAL_AMOUNT / ITEM_COUNT),0),0) AS UNIT_PRICE

    ,CD.SST_VALUE AS CUSTOMER_DETAIL

    INTO dbo.TBL_SAMPLE_TRANSACTION

    FROM SAMPLE_DATA SD

    LEFT OUTER JOIN dbo.TBL_SAMPLE_STRING CD

    ON SD.CUSTOMER_ID = CD.SST_ID;

    /* Clustered INdex */

    ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ADD CONSTRAINT PK_DBO_SAMPLE_TRANSACTION_TRAN_ID PRIMARY KEY CLUSTERED (TRAN_ID ASC);

    /* Jeff Moden's ISNULL trick for not null columns doesn't work on all

    data types so let's put some constraints in place

    */

    ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN TRAN_DATE DATE NOT NULL;

    ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN PAY_BY_DATE DATE NOT NULL;

    ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN ACT_PAY_DATE DATE NOT NULL;

    ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN DELIVERY_DATE DATE NOT NULL;

    ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN LOCATION_CODE CHAR(2) NOT NULL;

    /* Few applicable and inapplicable indices */

    CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_WEEK_DAY_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

    (WEEK_DAY ASC) INCLUDE (TRAN_ID,CUSTOMER_ID);

    CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_WEEK_DAY_IS_ONLINE_IS_PICKUP_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

    (WEEK_DAY ASC, IS_ONLINE ASC, IS_PICKUP ASC) INCLUDE (TRAN_ID,CUSTOMER_ID);

    CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_IS_ONLINE_IS_PICKUP_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

    (IS_ONLINE ASC, IS_PICKUP ASC) INCLUDE (TRAN_ID,CUSTOMER_ID);

    CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_TRAN_DATE_IS_ONLINE_IS_PICKUP_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

    (TRAN_DATE ASC, IS_ONLINE ASC, IS_PICKUP ASC) INCLUDE (TRAN_ID,CUSTOMER_ID);

    CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_TRAN_DATE_INCL_ITEM_COUNT ON [dbo].[TBL_SAMPLE_TRANSACTION] ([TRAN_DATE] ASC)

    INCLUDE ([ITEM_COUNT]);

    Test harness

    USE tempdb;

    GO

    SET NOCOUNT ON;

    BEGIN; -- setup

    IF OBJECT_ID('tempdb..#Months') is not null

    DROP TABLE #Months;

    CREATE TABLE #Months

    (

    TheMonthNumber char(2)

    , TheMonthName varchar(20)

    );

    INSERT #Months

    (TheMonthNumber, TheMonthName)

    VALUES

    ('01', 'January')

    , ('02', 'February')

    , ('03', 'March')

    , ('04', 'April')

    , ('05', 'May')

    , ('06', 'June')

    , ('07', 'July')

    , ('08', 'August')

    , ('09', 'September')

    , ('10', 'October')

    , ('11', 'November')

    , ('12', 'December')

    ;

    IF OBJECT_ID('tempdb..#MonthlyItemsOrdered') is not null

    DROP TABLE #MonthlyItemsOrdered;

    CREATE TABLE #MonthlyItemsOrdered

    (

    TheMonth varchar(20)

    , TotalOrdered int

    );

    END;

    GO

    DECLARE @TIMER TABLE (T_TEXT VARCHAR(100) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));

    DECLARE @MonthNumber char(2) = CONVERT(DATE,GETDATE(),0);

    DECLARE @MonthName varchar(20) = '';

    DECLARE @RangeStart date = CONVERT(DATE,GETDATE(),0);

    DECLARE @RangeEnd date = CONVERT(DATE,GETDATE(),0);

    DECLARE @INT_BUCKET INT = 0;

    INSERT INTO @TIMER(T_TEXT) VALUES ('DRY RUN');

    SELECT

    @INT_BUCKET = ST.ITEM_COUNT

    FROM dbo.TBL_SAMPLE_TRANSACTION ST

    WHERE ST.TRAN_DATE BETWEEN CONVERT(DATE,'20150101',112) AND CONVERT(DATE,'20151231',112);

    INSERT INTO @TIMER(T_TEXT) VALUES ('DRY RUN');

    INSERT INTO @TIMER(T_TEXT) VALUES ('CURSOR');

    DECLARE MyCursor CURSOR FAST_FORWARD FOR

    SELECT m.TheMonthNumber, m.TheMonthName

    FROM #Months m

    ;

    OPEN MyCursor

    FETCH NEXT FROM MyCursor

    INTO @MonthNumber, @MonthName

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @RangeStart = CONVERT(date, '2015' + @MonthNumber + '01');

    SET @RangeEnd = DATEADD(day, -1, DATEADD(month, 1, @RangeStart));

    SELECT

    @INT_BUCKET = SUM(t.[ITEM_COUNT])

    FROM [dbo].[TBL_SAMPLE_TRANSACTION] t

    WHERE t.TRAN_DATE BETWEEN @RangeStart AND @RangeEnd

    ;

    FETCH NEXT FROM MyCursor

    INTO @MonthNumber, @MonthName

    END

    CLOSE MyCursor

    DEALLOCATE MyCursor

    INSERT INTO @TIMER(T_TEXT) VALUES ('CURSOR');

    INSERT INTO @TIMER(T_TEXT) VALUES ('WHILE LOOP');

    SET @MonthNumber = '00';

    WHILE 1=1 -- break handled within

    BEGIN;

    SET @MonthName = '';

    SELECT TOP(1)

    @MonthNumber = m.TheMonthNumber

    , @MonthName = m.TheMonthName

    FROM #Months m

    WHERE m.TheMonthNumber > @MonthNumber

    ORDER BY TheMonthNumber

    ;

    IF @MonthName = ''

    BREAK;

    SET @RangeStart = CONVERT(date, '2015' + @MonthNumber + '01');

    SET @RangeEnd = DATEADD(day, -1, DATEADD(month, 1, @RangeStart));

    SELECT

    @INT_BUCKET = SUM(t.[ITEM_COUNT])

    FROM [dbo].[TBL_SAMPLE_TRANSACTION] t

    WHERE t.TRAN_DATE BETWEEN @RangeStart AND @RangeEnd

    ;

    END;

    INSERT INTO @TIMER(T_TEXT) VALUES ('WHILE LOOP');

    INSERT INTO @TIMER(T_TEXT) VALUES ('SET EE');

    SELECT

    @INT_BUCKET = SUM(ST.ITEM_COUNT)

    FROM dbo.TBL_SAMPLE_TRANSACTION ST

    WHERE ST.TRAN_DATE BETWEEN CONVERT(DATE,'20150101',112) AND CONVERT(DATE,'20151231',112)

    GROUP BY MONTH(ST.TRAN_DATE)

    ;

    INSERT INTO @TIMER(T_TEXT) VALUES ('SET EE');

    INSERT INTO @TIMER(T_TEXT) VALUES ('SET');

    SELECT

    @INT_BUCKET = SUM(t.[ITEM_COUNT])

    FROM [dbo].[TBL_SAMPLE_TRANSACTION] t

    WHERE t.TRAN_DATE BETWEEN CONVERT(date, '20150101') AND CONVERT(date, '20151231')

    GROUP BY CONCAT(RIGHT(CONCAT('00', MONTH(t.TRAN_DATE)), 2), ' - ', DATENAME(month, t.TRAN_DATE))

    ;

    INSERT INTO @TIMER(T_TEXT) VALUES ('SET');

    SELECT

    T.T_TEXT

    ,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION

    FROM @TIMER T

    GROUP BY T.T_TEXT

    ORDER BY DURATION ASC;

    IF OBJECT_ID(N'tempdb..#MonthlyItemsOrdered') IS NOT NULL DROP TABLE #MonthlyItemsOrdered;

  • Tom Staab

    Mr or Mrs. 500

    Points: 507

    Thanks for the response, Eirikur. You are correct that I missed the TRUNCATE with the set solution. I'll look into correcting that later today. Regarding your 3 points about statistics, however, I just double-checked my code, and I don't have any SET STATISTICS statements. Do you have them enabled by default in your SSMS configuration?

    I find it very surprising that you foudn the set-based code to be the slowest option. That just doesn't add up logically given the way the engine operates and the extra work involved with cursors and loops. I don't have time to review your code or test it now, but I'll do that later today.

  • Gail Shaw

    SSC Guru

    Points: 1004474

    Another comment:

    It is important to note that set-based design will not automatically result in faster performance, especially if that particular query includes subqueries in the SELECT clause. For example, a query like this:

    SELECT a

    , (SELECT x FROM table2) AS field1

    , (SELECT y FROM table3) AS field2

    , (SELECT z FROM table4) AS field3

    FROM table1

    looks like a set-based approach but really isn't because each of those subqueries will have to run for each row returned by the main query.

    Unfortunately that's just a common myth. It's not true, and a look at the execution plan would show that. The only time subqueries have to run once per row of the outer statement is when they're correlated subqueries of the form (SELECT TOP(1) ... ORDER BY ...)

    I can't run the code you posted (and besides, it would throw an error if any of table2, table3, table4 had more than one row), but it is on the list of things I need to blog about.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail Shaw

    SSC Guru

    Points: 1004474

    Tom Staab (12/29/2015)


    Regarding your 3 points about statistics, however, I just double-checked my code, and I don't have any SET STATISTICS statements.

    I just downloaded the attachment to your article (SQL set-based vs row-based.sql), and line 35:

    PRINT 'Begin cursor';

    SET STATISTICS TIME, IO ON;

    GO

    Line 83:

    PRINT 'Begin while loop';

    SET STATISTICS TIME, IO ON;

    GO

    and line 131

    PRINT 'Begin set';

    SET STATISTICS TIME, IO ON;

    GO

    Is the attachment an old version of your code?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Eirikur Eiriksson

    SSC Guru

    Points: 182425

    Tom Staab (12/29/2015)


    Thanks for the response, Eirikur. You are correct that I missed the TRUNCATE with the set solution. I'll look into correcting that later today. Regarding your 3 points about statistics, however, I just double-checked my code, and I don't have any SET STATISTICS statements. Do you have them enabled by default in your SSMS configuration?

    I find it very surprising that you foudn the set-based code to be the slowest option. That just doesn't add up logically given the way the engine operates and the extra work involved with cursors and loops. I don't have time to review your code or test it now, but I'll do that later today.

    No worries Tom

    In the download, http://www.sqlservercentral.com/Files/SQL%20set-based%20vs%20row-based.sql/27733.sql, lines 35, 83 and 131 have SET STATISTICS TIME, IO ON;

    In your set code, using the formatted month number and name for the grouping is a killer for the query, changing that to the month only reduces the execution time by 2/3.

    😎

    Edit: Unsurprisingly Gail beat me to it pointing out the SET STATISTICS in the download.

  • Tom Staab

    Mr or Mrs. 500

    Points: 507

    Oh. Thanks, Gail. I need to look into that code discrepancy. I'm not sure what happened there. I was looking at the inline code in the article and my own notes. I do recall seeing the statistics spam in the looping solutions. I probably removed them and updated the article but not the attachment.

    Regarding the subqueries, of course you are correct. I really didn't word that very well. I should have made it clear that it's not necessarily a set-based solution and might cause the problem I indicated. I was thinking about a specific case from work that did exactly that, but it's not always that bad.

    This is my first article published by a third-party. Reading the "you were wrong here" comments is never easy, but I agree with a coworker who is fond of saying every mistake is an "opportunity for improvement." I hope this is just the first of many articles for me, and I'll use the feedback to make future ones better. Thanks for reading the article, and thanks again for the input.

  • Tom Staab

    Mr or Mrs. 500

    Points: 507

    Eirikur, I completely agree about the month formatting. That kind of thing is definitely not preferable. It was only done to try to make the output the same in all of the solutions while not over-complicating the code. Perhaps I need to rethink that. This is all part of a longer presentation I've done before and will do again in the future, so this is all good feedback to improve that as well. 🙂

  • Indianrock

    SSC-Insane

    Points: 20333

    Thanks for the article. I do try to avoid cursors and while loops. In fact, our development team just wrote some stored procedures to improve upon a data extract that has been performed with a C# solution ( took days to run ).

    Unfortunately, the new stored procedure approach used a while loop which processed far more records than necessary. My edit using a temp table near the beginning reduced the stored procedure approach from 24 hours to less than 2 for all five extracts.

    Where I have resorted to while loops is in very large purges of old data in production ( e.g. while .... delete top 20000 etc ) it was an attempt to minimize blocking.

  • Eirikur Eiriksson

    SSC Guru

    Points: 182425

    Tom Staab (12/29/2015)


    Eirikur, I completely agree about the month formatting. That kind of thing is definitely not preferable. It was only done to try to make the output the same in all of the solutions while not over-complicating the code. Perhaps I need to rethink that. This is all part of a longer presentation I've done before and will do again in the future, so this is all good feedback to improve that as well. 🙂

    Keep up the good work Tom, hope you are not deterred by the comments;-) I learned really quickly that having my writing reviewed by peers can literally safe ones face.

    😎

    As to why the loops are performing better than the set based code in your example, there is another factor too. The set based code implies a hash aggregation while the loops can utilize stream aggregation as there is effectively no partitioning in each loop's execution. Makes quite a difference.

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Good basic instructions, thanks.

  • Gail Shaw

    SSC Guru

    Points: 1004474

    Eirikur Eiriksson (12/29/2015)


    The set based code implies a hash aggregation while the loops can utilize stream aggregation as there is effectively no partitioning in each loop's execution. Makes quite a difference.

    If that's the case, then adding a supporting index will make a huge difference to the set-based version.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • AZJim

    Default port

    Points: 1432

    Tom ... I really appreciate what you have demonstrated. I have been around awhile, first working on mainframe and Unix databases, and now for the last few years with SQL Server. I have heard from many quarters to never use cursors. This perplexed me because cursors were generally the standard on mainframe database processing. While the performance data you shared would be the same for any database, I think the difference is the degree of sharing on a processor. On the database we have many, many multi-terabyte tables that were being processed by literally hundreds of online transactions each second. SET processing could easily undermine performance for others if a query got out of control (such as with the subquery example you mentioned). If someone didn't specify WITH (NOLOCK), you could imagine the heartburn for those responsible for performance. If it is an update, you could be degrading the log unless you can break the change up into smaller chunks. But for the bulk of the uses, I can see the value of simply using SET processing. It's clean and simple. Thanks.

  • Eirikur Eiriksson

    SSC Guru

    Points: 182425

    GilaMonster (12/29/2015)


    Eirikur Eiriksson (12/29/2015)


    The set based code implies a hash aggregation while the loops can utilize stream aggregation as there is effectively no partitioning in each loop's execution. Makes quite a difference.

    If that's the case, then adding a supporting index will make a huge difference to the set-based version.

    Certainly did improve both the set based and the looping methods, the latter more than the former as it eliminated the hash aggregate in the loops but not in the set.

    😎

  • Xedni

    SSCertifiable

    Points: 6946

    Thanks for the post. I thought fast_forward cursors were the exception to the rule of cursors being slow because it's only supposed to query the data once, and it only has to read forward. That doesn't seem to be the case in this test. Is my understanding of such cursors wrong?

    Executive Junior Cowboy Developer, Esq.[/url]

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply