Get Set For Speed

Question

Post reply

Get Set For Speed

Tom Staab

Mr or Mrs. 500

Points: 507
More actions
December 29, 2015 at 12:13 am

#323231

Comments posted to this topic are about the item Get Set For Speed

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182899 More actions · Answer 1

Thank you Tom for this piece!

😎

After reading it I do have few concerns about the findings:

1. The execution times suggest that a covering index for the aggregation is in place, can you verify that?

2. Different units of work are within each timing section of the code, both the cursor and the while loop section include the truncation of the temporary table whilst the set section doesn't. This is bound to seriously affect the outcome.

3. Using a temporary table to "bucket" the outcome is fine when verifying the results but the method shouldn't be used when timing the execution as file growth events etc. could alter the outcome.

4. As all the queries are reading and aggregating exactly the same data, statistics io are not adding anything to the results.

5. When timing cursor or while loop executions the output of the statistics will affect the results as those will be outputted for each execution within the cursor or the while loop.

6. Using statistics time is not the most accurate method of measuring execution time and can in fact affect the results, strongly suggest using other alternatives.

Having corrected the discrepancies within each section making certain that only equal amount of work is being measured the findings are quite different:

Style ElapsedMilliseconds

cursor 127

while loop 108

set 148

I rewrote the set query excluding the formatting of the month and it's use in the grouping which is a serious flaw in the code:

StyleElapsedMilliseconds

set EE 54

Adjusting the test harness by implementing a different timing method and eliminating the temp table insert, the results are drastically different:

T_TEXT DURATION

------------- -----------

CURSOR 23002

WHILE LOOP 24001

DRY RUN 27001

SET EE 50003

SET 145008

Code to generate the test data

USE tempdb;

GO

/* View wrapper for using the NEWID() function

within a table value function

*/

IF OBJECT_ID(N'dbo.VNEWID') IS NULL

BEGIN

DECLARE @CREATE_VIEW NVARCHAR(MAX) = N'

CREATE VIEW dbo.VNEWID

AS

SELECT NEWID() AS NID;

'

EXEC (@CREATE_VIEW);

END

/* Test set generator, inspired by Lynn Pettis's random

string function

*/

IF OBJECT_ID(N'dbo.ITVFN_DO_SHAKESPEARE') IS NULL

BEGIN

DECLARE @CREATE_FUNCTION NVARCHAR(MAX) = N'

/*

Sample text set generator, having an infinite number of code

monkeys calling this function for infinite number of times

explains the name ;-)

2015-01-18

Eirikur Eiriksson

*/

CREATE FUNCTION dbo.ITVFN_DO_SHAKESPEARE

(

@BASE_LENGTH INT

,@BASE_VARIANCE INT

,@WORD_LENGTH INT

,@WORD_VARIANCE INT

,@ROWCOUNT INT

,@DELIMITER CHAR(1)

)

RETURNS TABLE

AS

RETURN

WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

,NUMS(N) AS (SELECT TOP (@BASE_LENGTH + @BASE_VARIANCE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL) DESC) AS N FROM T T1,T T2,T T3,T T4)

,RNUM(R) AS (SELECT TOP (@ROWCOUNT) ROW_NUMBER() OVER (ORDER BY (SELECT NULL) DESC) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)

SELECT

RN.R

,((SELECT TOP(@BASE_LENGTH + ((SELECT CHECKSUM(NID) FROM dbo.VNEWID) % @BASE_VARIANCE))

CASE

WHEN (NM.N + RN.R + (CHECKSUM(X.NID) % @WORD_LENGTH)) % @WORD_VARIANCE = 0 THEN @DELIMITER

ELSE CHAR(65 + (ABS(CHECKSUM(X.NID)) % 26))

END

FROM NUMS NM

CROSS APPLY dbo.VNEWID X

FOR XML PATH(''''), TYPE).value(''.[1]'',''VARCHAR(8000)'')) AS RND_TXT

FROM RNUM RN;

';

EXEC (@CREATE_FUNCTION);

END

/* Test set parameters

Most parameters are proportional so only the

@SAMPLE_SIZE (cardinality) has to be set.

*/

SET NOCOUNT ON;

DECLARE @SAMPLE_SIZE INT = 1000000; -- Number of "Transactions"

DECLARE @OUTLET_COUNT INT = @SAMPLE_SIZE / 200; -- Number of "Outlets"

DECLARE @BASE_DATE DATE = CONVERT(DATE,'2014-01-01',126); -- Base Date, all dates are based on this.

DECLARE @ZERO_DATE DATE = CONVERT(DATE,'1900-01-01',126); -- Monday 1st. January 1900.

DECLARE @DATE_RANGE INT = 1096; -- +/- 3 Years

DECLARE @MAX_PAY_DAYS INT = 90; -- Pay by date offset

DECLARE @MAX_ITEMS INT = 20; -- Maximum number of Items

DECLARE @ACT_PAY_DAYS INT = 99; -- "Actual" Pay Date

DECLARE @AVG_PER_GROUP INT = 500; -- Additional Group Identifier Parameter

DECLARE @GROUP_COUNT INT = @SAMPLE_SIZE / @AVG_PER_GROUP; -- Number of Groups

DECLARE @CUSTOMER_COUNT INT = @SAMPLE_SIZE / 4; -- Number of Customers

/* Random text generation for "customer details" */

DECLARE @BASE_LENGTH INT = 50 ;

DECLARE @BASE_VARIANCE INT = 49 ;

DECLARE @WORD_LENGTH INT = 7 ;

DECLARE @WORD_VARIANCE INT = 6 ;

/* Get few nulls in the detail column be having slightly fewer

entries than possible customer_id

*/

DECLARE @ROWCOUNT INT = @CUSTOMER_COUNT - 100 ;

DECLARE @DELIMITER CHAR(1) = CHAR(32);

/* "customer details" */

IF OBJECT_ID(N'dbo.TBL_SAMPLE_STRING') IS NOT NULL DROP TABLE dbo.TBL_SAMPLE_STRING;

CREATE TABLE dbo.TBL_SAMPLE_STRING

(

SST_ID INT NOT NULL CONSTRAINT PK_DBO_TBL_SAMPLE_STRING_SST_ID PRIMARY KEY CLUSTERED

,SST_VALUE VARCHAR(500) NOT NULL

);

/* Create "Customer Details" */

INSERT INTO dbo.TBL_SAMPLE_STRING (SST_ID, SST_VALUE)

SELECT

X.R

,X.RND_TXT

FROM dbo.ITVFN_DO_SHAKESPEARE(@BASE_LENGTH,@BASE_VARIANCE,@WORD_LENGTH,@WORD_VARIANCE,@ROWCOUNT,@DELIMITER) AS X;

/* Drop the dbo.TBL_SAMPLE_TRANSACTION test set table rather

than

*/

IF OBJECT_ID(N'dbo.TBL_SAMPLE_TRANSACTION') IS NOT NULL DROP TABLE dbo.TBL_SAMPLE_TRANSACTION;

/* Inline Tally Table

20^7 = 1,280,000,000 Max

*/

;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)

,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

, NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7)

,SAMPLE_DATA AS

(

SELECT

NM.N AS TRAN_ID

,DATEADD(DAY,CHECKSUM(NEWID()) % @DATE_RANGE,@BASE_DATE) AS TRAN_DATE

,(ABS(CHECKSUM(NEWID())) % @OUTLET_COUNT) + 1 AS OUTLET_ID

,(ABS(CHECKSUM(NEWID())) % @GROUP_COUNT) + 1 AS GROUP_ID

,(ABS(CHECKSUM(NEWID())) % @CUSTOMER_COUNT) + 1 AS CUSTOMER_ID

,(ABS(CHECKSUM(NEWID())) % @AVG_PER_GROUP) + 1 AS DETAIL_ID

,CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 ))

--+ CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 )) -- uncomment this line to make the entry "almost" unique

+ CHAR(45) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 10,0)

+ CHAR(58) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 100,0) AS PROD_NO

,CONVERT(NUMERIC(12,2),SQRT(ABS(CHECKSUM(NEWID())) + 2),0) AS TOTAL_AMOUNT

,(ABS(CHECKSUM(NEWID())) % @MAX_PAY_DAYS) + 1 AS PAY_BY_DAYS

,(ABS(CHECKSUM(NEWID())) % @ACT_PAY_DAYS) + 1 AS ACT_PAY_DAYS

,(ABS(CHECKSUM(NEWID())) % @MAX_ITEMS) + 1 AS ITEM_COUNT

--,ASCII(':')

FROM NUMS NM

)

SELECT

ISNULL(SD.TRAN_ID,1) AS TRAN_ID

,SD.TRAN_DATE AS TRAN_DATE

,ISNULL((DATEDIFF(DAY,@ZERO_DATE,SD.TRAN_DATE) % 7) + 1 ,0) AS WEEK_DAY

,DATEADD(DAY,SD.PAY_BY_DAYS,SD.TRAN_DATE) AS PAY_BY_DATE

,DATEADD(DAY,SD.ACT_PAY_DAYS,SD.TRAN_DATE) AS ACT_PAY_DATE

,DATEADD(DAY

,FLOOR((SD.PAY_BY_DAYS + SD.ACT_PAY_DAYS) / 2)

,SD.TRAN_DATE) AS DELIVERY_DATE

,CHAR(65 + ( SD.OUTLET_ID % 26 ))

+ CHAR(65 + ( SD.OUTLET_ID % 20 )) AS LOCATION_CODE

,ISNULL(CHAR(65 + ( SD.ACT_PAY_DAYS % 26 ))

+ CHAR(65 + ( SD.ITEM_COUNT % 20 ))

+ RIGHT(CONVERT(VARCHAR(8),1000000

+ (SD.ACT_PAY_DAYS * SD.ITEM_COUNT),0),6),'ZZ999999') AS EMP_ID

,ISNULL(SD.OUTLET_ID ,1) AS OUTLET_ID

,ISNULL(CONVERT(

TINYINT,1 - SIGN(3 - (SD.OUTLET_ID & 0x03)),0) ,1) AS IS_ONLINE

,ISNULL(CONVERT(

TINYINT,1 - SIGN(7 - (SD.OUTLET_ID & 0x07)),0) ,1) AS IS_PICKUP

,ISNULL(SD.CUSTOMER_ID ,1) AS CUSTOMER_ID

,ISNULL(SD.GROUP_ID ,1) AS GROUP_ID

,ISNULL(SD.DETAIL_ID ,1) AS DETAIL_ID

,ISNULL(SD.PROD_NO,'ZZ-9:99') AS PROD_NO

,ISNULL(SD.TOTAL_AMOUNT,99.99) AS TOTAL_AMOUNT

,ISNULL(SD.ITEM_COUNT,1) AS ITEM_COUNT

,ISNULL(CONVERT(NUMERIC(9,2),(0.1 * SD.TOTAL_AMOUNT),0),0) AS TAX_AMOUNT

,ISNULL(CONVERT(NUMERIC(9,2)

,(0.9 * SD.TOTAL_AMOUNT / ITEM_COUNT),0),0) AS UNIT_PRICE

,CD.SST_VALUE AS CUSTOMER_DETAIL

INTO dbo.TBL_SAMPLE_TRANSACTION

FROM SAMPLE_DATA SD

LEFT OUTER JOIN dbo.TBL_SAMPLE_STRING CD

ON SD.CUSTOMER_ID = CD.SST_ID;

/* Clustered INdex */

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ADD CONSTRAINT PK_DBO_SAMPLE_TRANSACTION_TRAN_ID PRIMARY KEY CLUSTERED (TRAN_ID ASC);

/* Jeff Moden's ISNULL trick for not null columns doesn't work on all

data types so let's put some constraints in place

*/

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN TRAN_DATE DATE NOT NULL;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN PAY_BY_DATE DATE NOT NULL;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN ACT_PAY_DATE DATE NOT NULL;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN DELIVERY_DATE DATE NOT NULL;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN LOCATION_CODE CHAR(2) NOT NULL;

/* Few applicable and inapplicable indices */

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_WEEK_DAY_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

(WEEK_DAY ASC) INCLUDE (TRAN_ID,CUSTOMER_ID);

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_WEEK_DAY_IS_ONLINE_IS_PICKUP_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

(WEEK_DAY ASC, IS_ONLINE ASC, IS_PICKUP ASC) INCLUDE (TRAN_ID,CUSTOMER_ID);

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_IS_ONLINE_IS_PICKUP_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

(IS_ONLINE ASC, IS_PICKUP ASC) INCLUDE (TRAN_ID,CUSTOMER_ID);

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_TRAN_DATE_IS_ONLINE_IS_PICKUP_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

(TRAN_DATE ASC, IS_ONLINE ASC, IS_PICKUP ASC) INCLUDE (TRAN_ID,CUSTOMER_ID);

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_TRAN_DATE_INCL_ITEM_COUNT ON [dbo].[TBL_SAMPLE_TRANSACTION] ([TRAN_DATE] ASC)

INCLUDE ([ITEM_COUNT]);

Test harness

USE tempdb;

GO

SET NOCOUNT ON;

BEGIN; -- setup

IF OBJECT_ID('tempdb..#Months') is not null

DROP TABLE #Months;

CREATE TABLE #Months

(

TheMonthNumber char(2)

, TheMonthName varchar(20)

);

INSERT #Months

(TheMonthNumber, TheMonthName)

VALUES

('01', 'January')

, ('02', 'February')

, ('03', 'March')

, ('04', 'April')

, ('05', 'May')

, ('06', 'June')

, ('07', 'July')

, ('08', 'August')

, ('09', 'September')

, ('10', 'October')

, ('11', 'November')

, ('12', 'December')

;

IF OBJECT_ID('tempdb..#MonthlyItemsOrdered') is not null

DROP TABLE #MonthlyItemsOrdered;

CREATE TABLE #MonthlyItemsOrdered

(

TheMonth varchar(20)

, TotalOrdered int

);

END;

GO

DECLARE @timer TABLE (T_TEXT VARCHAR(100) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));

DECLARE @MonthNumber char(2) = CONVERT(DATE,GETDATE(),0);

DECLARE @MonthName varchar(20) = '';

DECLARE @RangeStart date = CONVERT(DATE,GETDATE(),0);

DECLARE @RangeEnd date = CONVERT(DATE,GETDATE(),0);

DECLARE @INT_BUCKET INT = 0;

INSERT INTO @timer(T_TEXT) VALUES ('DRY RUN');

SELECT

@INT_BUCKET = ST.ITEM_COUNT

FROM dbo.TBL_SAMPLE_TRANSACTION ST

WHERE ST.TRAN_DATE BETWEEN CONVERT(DATE,'20150101',112) AND CONVERT(DATE,'20151231',112);

INSERT INTO @timer(T_TEXT) VALUES ('DRY RUN');

INSERT INTO @timer(T_TEXT) VALUES ('CURSOR');

DECLARE MyCursor CURSOR FAST_FORWARD FOR

SELECT m.TheMonthNumber, m.TheMonthName

FROM #Months m

;

OPEN MyCursor

FETCH NEXT FROM MyCursor

INTO @MonthNumber, @MonthName

WHILE @@FETCH_STATUS = 0

BEGIN

SET @RangeStart = CONVERT(date, '2015' + @MonthNumber + '01');

SET @RangeEnd = DATEADD(day, -1, DATEADD(month, 1, @RangeStart));

SELECT

@INT_BUCKET = SUM(t.[ITEM_COUNT])

FROM [dbo].[TBL_SAMPLE_TRANSACTION] t

WHERE t.TRAN_DATE BETWEEN @RangeStart AND @RangeEnd

;

FETCH NEXT FROM MyCursor

INTO @MonthNumber, @MonthName

END

CLOSE MyCursor

DEALLOCATE MyCursor

INSERT INTO @timer(T_TEXT) VALUES ('CURSOR');

INSERT INTO @timer(T_TEXT) VALUES ('WHILE LOOP');

SET @MonthNumber = '00';

WHILE 1=1 -- break handled within

BEGIN;

SET @MonthName = '';

SELECT TOP(1)

@MonthNumber = m.TheMonthNumber

, @MonthName = m.TheMonthName

FROM #Months m

WHERE m.TheMonthNumber > @MonthNumber

ORDER BY TheMonthNumber

;

IF @MonthName = ''

BREAK;

SET @RangeStart = CONVERT(date, '2015' + @MonthNumber + '01');

SET @RangeEnd = DATEADD(day, -1, DATEADD(month, 1, @RangeStart));

SELECT

@INT_BUCKET = SUM(t.[ITEM_COUNT])

FROM [dbo].[TBL_SAMPLE_TRANSACTION] t

WHERE t.TRAN_DATE BETWEEN @RangeStart AND @RangeEnd

;

END;

INSERT INTO @timer(T_TEXT) VALUES ('WHILE LOOP');

INSERT INTO @timer(T_TEXT) VALUES ('SET EE');

SELECT

@INT_BUCKET = SUM(ST.ITEM_COUNT)

FROM dbo.TBL_SAMPLE_TRANSACTION ST

WHERE ST.TRAN_DATE BETWEEN CONVERT(DATE,'20150101',112) AND CONVERT(DATE,'20151231',112)

GROUP BY MONTH(ST.TRAN_DATE)

;

INSERT INTO @timer(T_TEXT) VALUES ('SET EE');

INSERT INTO @timer(T_TEXT) VALUES ('SET');

SELECT

@INT_BUCKET = SUM(t.[ITEM_COUNT])

FROM [dbo].[TBL_SAMPLE_TRANSACTION] t

WHERE t.TRAN_DATE BETWEEN CONVERT(date, '20150101') AND CONVERT(date, '20151231')

GROUP BY CONCAT(RIGHT(CONCAT('00', MONTH(t.TRAN_DATE)), 2), ' - ', DATENAME(month, t.TRAN_DATE))

;

INSERT INTO @timer(T_TEXT) VALUES ('SET');

SELECT

T.T_TEXT

,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION

FROM @timer T

GROUP BY T.T_TEXT

ORDER BY DURATION ASC;

IF OBJECT_ID(N'tempdb..#MonthlyItemsOrdered') IS NOT NULL DROP TABLE #MonthlyItemsOrdered;

Tom Staab Mr or Mrs. 500 Points: 507 More actions · Answer 2

Thanks for the response, Eirikur. You are correct that I missed the TRUNCATE with the set solution. I'll look into correcting that later today. Regarding your 3 points about statistics, however, I just double-checked my code, and I don't have any SET STATISTICS statements. Do you have them enabled by default in your SSMS configuration?

I find it very surprising that you foudn the set-based code to be the slowest option. That just doesn't add up logically given the way the engine operates and the extra work involved with cursors and loops. I don't have time to review your code or test it now, but I'll do that later today.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Another comment:

It is important to note that set-based design will not automatically result in faster performance, especially if that particular query includes subqueries in the SELECT clause. For example, a query like this:
SELECT a
, (SELECT x FROM table2) AS field1
, (SELECT y FROM table3) AS field2
, (SELECT z FROM table4) AS field3
FROM table1
looks like a set-based approach but really isn't because each of those subqueries will have to run for each row returned by the main query.

Unfortunately that's just a common myth. It's not true, and a look at the execution plan would show that. The only time subqueries have to run once per row of the outer statement is when they're correlated subqueries of the form (SELECT TOP(1) ... ORDER BY ...)

I can't run the code you posted (and besides, it would throw an error if any of table2, table3, table4 had more than one row), but it is on the list of things I need to blog about.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

Tom Staab (12/29/2015)
Regarding your 3 points about statistics, however, I just double-checked my code, and I don't have any SET STATISTICS statements.

I just downloaded the attachment to your article (SQL set-based vs row-based.sql), and line 35:

PRINT 'Begin cursor';

SET STATISTICS TIME, IO ON;

GO

Line 83:

PRINT 'Begin while loop';

SET STATISTICS TIME, IO ON;

GO

and line 131

PRINT 'Begin set';

SET STATISTICS TIME, IO ON;

GO

Is the attachment an old version of your code?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Eirikur Eiriksson SSC Guru Points: 182899 More actions · Answer 5

Tom Staab (12/29/2015)
Thanks for the response, Eirikur. You are correct that I missed the TRUNCATE with the set solution. I'll look into correcting that later today. Regarding your 3 points about statistics, however, I just double-checked my code, and I don't have any SET STATISTICS statements. Do you have them enabled by default in your SSMS configuration?
I find it very surprising that you foudn the set-based code to be the slowest option. That just doesn't add up logically given the way the engine operates and the extra work involved with cursors and loops. I don't have time to review your code or test it now, but I'll do that later today.

No worries Tom

In the download, http://www.sqlservercentral.com/Files/SQL%20set-based%20vs%20row-based.sql/27733.sql, lines 35, 83 and 131 have SET STATISTICS TIME, IO ON;

In your set code, using the formatted month number and name for the grouping is a killer for the query, changing that to the month only reduces the execution time by 2/3.

😎

Edit: Unsurprisingly Gail beat me to it pointing out the SET STATISTICS in the download.

Tom Staab Mr or Mrs. 500 Points: 507 More actions · Answer 6

Oh. Thanks, Gail. I need to look into that code discrepancy. I'm not sure what happened there. I was looking at the inline code in the article and my own notes. I do recall seeing the statistics spam in the looping solutions. I probably removed them and updated the article but not the attachment.

Regarding the subqueries, of course you are correct. I really didn't word that very well. I should have made it clear that it's not necessarily a set-based solution and might cause the problem I indicated. I was thinking about a specific case from work that did exactly that, but it's not always that bad.

This is my first article published by a third-party. Reading the "you were wrong here" comments is never easy, but I agree with a coworker who is fond of saying every mistake is an "opportunity for improvement." I hope this is just the first of many articles for me, and I'll use the feedback to make future ones better. Thanks for reading the article, and thanks again for the input.

Tom Staab Mr or Mrs. 500 Points: 507 More actions · Answer 7

Eirikur, I completely agree about the month formatting. That kind of thing is definitely not preferable. It was only done to try to make the output the same in all of the solutions while not over-complicating the code. Perhaps I need to rethink that. This is all part of a longer presentation I've done before and will do again in the future, so this is all good feedback to improve that as well. 🙂

Indianrock SSC-Insane Points: 20333 More actions · Answer 8

Thanks for the article. I do try to avoid cursors and while loops. In fact, our development team just wrote some stored procedures to improve upon a data extract that has been performed with a C# solution ( took days to run ).

Unfortunately, the new stored procedure approach used a while loop which processed far more records than necessary. My edit using a temp table near the beginning reduced the stored procedure approach from 24 hours to less than 2 for all five extracts.

Where I have resorted to while loops is in very large purges of old data in production ( e.g. while .... delete top 20000 etc ) it was an attempt to minimize blocking.

Eirikur Eiriksson SSC Guru Points: 182899 More actions · Answer 9

Tom Staab (12/29/2015)
Eirikur, I completely agree about the month formatting. That kind of thing is definitely not preferable. It was only done to try to make the output the same in all of the solutions while not over-complicating the code. Perhaps I need to rethink that. This is all part of a longer presentation I've done before and will do again in the future, so this is all good feedback to improve that as well. 🙂

Keep up the good work Tom, hope you are not deterred by the comments;-) I learned really quickly that having my writing reviewed by peers can literally safe ones face.

😎

As to why the loops are performing better than the set based code in your example, there is another factor too. The set based code implies a hash aggregation while the loops can utilize stream aggregation as there is effectively no partitioning in each loop's execution. Makes quite a difference.

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 10

akljfhnlaflkj

SSC Guru

Points: 76202

December 29, 2015 at 6:34 am

#1848460

Good basic instructions, thanks.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

Eirikur Eiriksson (12/29/2015)
The set based code implies a hash aggregation while the loops can utilize stream aggregation as there is effectively no partitioning in each loop's execution. Makes quite a difference.

If that's the case, then adding a supporting index will make a huge difference to the set-based version.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

AZJim Default port Points: 1432 More actions · Answer 12

Tom ... I really appreciate what you have demonstrated. I have been around awhile, first working on mainframe and Unix databases, and now for the last few years with SQL Server. I have heard from many quarters to never use cursors. This perplexed me because cursors were generally the standard on mainframe database processing. While the performance data you shared would be the same for any database, I think the difference is the degree of sharing on a processor. On the database we have many, many multi-terabyte tables that were being processed by literally hundreds of online transactions each second. SET processing could easily undermine performance for others if a query got out of control (such as with the subquery example you mentioned). If someone didn't specify WITH (NOLOCK), you could imagine the heartburn for those responsible for performance. If it is an update, you could be degrading the log unless you can break the change up into smaller chunks. But for the bulk of the uses, I can see the value of simply using SET processing. It's clean and simple. Thanks.

Eirikur Eiriksson SSC Guru Points: 182899 More actions · Answer 13

GilaMonster (12/29/2015)
Eirikur Eiriksson (12/29/2015)
The set based code implies a hash aggregation while the loops can utilize stream aggregation as there is effectively no partitioning in each loop's execution. Makes quite a difference.
If that's the case, then adding a supporting index will make a huge difference to the set-based version.

Certainly did improve both the set based and the looping methods, the latter more than the former as it eliminated the hash aggregate in the loops but not in the set.

😎

Xedni SSCertifiable Points: 6951 More actions · Answer 14

Thanks for the post. I thought fast_forward cursors were the exception to the rule of cursors being slow because it's only supposed to query the data once, and it only has to read forward. That doesn't seem to be the case in this test. Is my understanding of such cursors wrong?

Executive Junior Cowboy Developer, Esq.[/url]