Set-Based Solution Possible?

Question

Post reply

Set-Based Solution Possible?

Viewing 9 posts - 31 through 38 (of 38 total)

You must be logged in to reply to this topic. Login to reply

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 1

Here is how I both created sample data and timetested...

SET NOCOUNT ON

-- Create sample data

CREATE TABLETestData

(

StaffName VARCHAR(40),

ReviewDate SMALLDATETIME

)

GO

-- Populate the table with 512000 records

INSERTTestData

(

StaffName,

ReviewDate

)

SELECTsn.Number,

36524 + ABS(CHECKSUM(NEWID())) % 7304

FROMmaster..spt_values AS sn

INNER JOINmaster..spt_values AS rd ON rd.Type = 'p'

INNER JOINmaster..spt_values AS y ON y.Type = 'p'

WHEREsn.Type = 'p'

AND sn.Number < 100

AND rd.Number < 256

AND y.Number < 20

GO

CREATE CLUSTERED INDEX IX_TestData_StaffName_ReviewDate ON TestData (StaffName, ReviewDate)

GO

-- Create Sergiys tally table with 65536 rescords

CREATE TABLETally

(

N INT PRIMARY KEY CLUSTERED

)

INSERTTally

(

N

)

SELECT DISTINCT256 * v1.Number + v2.Number

FROMmaster..spt_values AS v1

INNER JOINmaster..spt_values AS v2 ON v2.Type = 'p'

WHEREv1.Type = 'p'

AND v1.Number < 256

AND v2.Number < 256

GO

-- Create Sergiys function

CREATE FUNCTION dbo.tReviewsOfYear

(

@ReviewYear SMALLDATETIME

)

RETURNS @reviews TABLE (

ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,

StaffName VARCHAR(40),

ReviewNo int NULL,

ReviewDate SMALLDATETIME NOT NULL

)

AS

BEGIN

-- To make sure we deal with "year only" value

SET @ReviewYear = DATEADD(YY, DATEDIFF(YY, 0, @ReviewYear), 0)

DECLARE @StaffName VARCHAR(200)

DECLARE @Count INT

INSERT INTO @reviews (StaffName, ReviewDate)

SELECT StaffName, ReviewDate

FROM TestData

WHERE ReviewDate >= @ReviewYear

AND ReviewDate < DATEADD(YY, 1, @ReviewYear)

ORDER BY StaffName, ReviewDate

-- Sequential update driven by PRIMARY KEY

UPDATE R

SET @Count = ReviewNo = CASE WHEN @StaffName = StaffName THEN @Count + 1 ELSE 1 END,

@StaffName = StaffName

FROM @reviews R

RETURN

END

GO

-- Prepare timetesting

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

DECLARE @Time datetime

SET @Time = GETDATE()

-- Now we timetest Sergiys suggestion

DECLARE @N int

SELECT @N = MAX (CNT)

FROM (select COUNT(*) CNT

FROM TestData

GROUP BY StaffName

) DT

select CASE WHEN T.N = 1 THEN S.StaffName ELSE '' END,

T.N, Y2006.ReviewDate [2006_Dates], Y2007.ReviewDate [2007_Dates], Y2008.ReviewDate [2008_Dates]

from TestData S

INNER JOIN Tally T ON T.N > 0 AND T.N < @N

-- If you expect more than 5000 reviews per year per customer increase this number

LEFT HASH JOIN dbo.tReviewsOfYear ('2006-11-01') Y2006 ON S.StaffName = Y2006.StaffName AND T.N = Y2006.ReviewNo

LEFT HASH JOIN dbo.tReviewsOfYear ('2007-01-01') Y2007 ON S.StaffName = Y2007.StaffName AND T.N = Y2007.ReviewNo

LEFT HASH JOIN dbo.tReviewsOfYear ('2008-01-01') Y2008 ON S.StaffName = Y2008.StaffName AND T.N = Y2008.ReviewNo

WHERE Y2006.ReviewNo IS NOT NULL OR Y2007.ReviewNo IS NOT NULL OR Y2008.ReviewNo IS NOT NULL

GROUP BY S.StaffName, T.N, Y2006.ReviewDate , Y2007.ReviewDate , Y2008.ReviewDate

ORDER BY S.StaffName, T.N

-- Now we display the time for Sergyis suggestion

PRINT 'Sergyi time taken ' + CONVERT(varchar(20), DATEDIFF(MILLISECOND, @Time, GETDATE())) + ' milliseconds.'

-- Prepare timetesting

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

-- Now we timetest Pesos suggestion

SET @Time = GETDATE()

DECLARE @BaseYear SMALLINT , @IncludeAllStaff BIT

DECLARE @MinDate SMALLDATETIME,

@MaxDate SMALLDATETIME

IF @BaseYear IS NULL

SET @BaseYear = DATEPART(YEAR, GETDATE())

SELECT @MaxDate = DATEADD(YEAR, @BaseYear - 1899, '19000101'),

@MinDate = DATEADD(YEAR, -3, @MaxDate)

CREATE TABLE #Stage

(

RowID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,

StaffName VARCHAR(40) NOT NULL,

ReviewDate VARCHAR(10) NOT NULL

)

INSERT #Stage

(

StaffName,

ReviewDate

)

SELECT DISTINCT StaffName,

CONVERT(CHAR(10), ReviewDate, 120)

FROM TestData

WHERE ReviewDate >= @MinDate

AND ReviewDate < @MaxDate

ORDER BY StaffName,

CONVERT(CHAR(10), ReviewDate, 120) DESC

IF @IncludeAllStaff = 1

INSERT #Stage

(

StaffName,

ReviewDate

)

SELECT DISTINCT t.StaffName,

''

FROM tTestData AS t

LEFT JOIN #Stage AS s ON s.StaffName = t.StaffName

WHERE s.StaffName IS NULL

SELECT CASE WHEN u.theIndex = 0 THEN u.StafName ELSE '' END AS StaffName,

MAX(CASE WHEN u.theYear = @BaseYear - 2 THEN u.ReviewDate ELSE '' END) AS y2,

MAX(CASE WHEN u.theYear = @BaseYear - 1 THEN u.ReviewDate ELSE '' END) AS y1,

MAX(CASE WHEN u.theYear = @BaseYear THEN u.ReviewDate ELSE '' END) AS y0

FROM (

SELECT s.StaffName AS StafName,

m.theYear,

s.ReviewDate,

s.RowID - m.minRowID AS theIndex

FROM #Stage AS s

INNER JOIN (

SELECT StaffName,

LEFT(ReviewDate, 4) AS theYear,

MIN(RowID) AS minRowID

FROM #Stage

GROUP BY StaffName,

LEFT(ReviewDate, 4)

) AS m ON m.StaffName = s.StaffName

WHERE s.ReviewDate LIKE m.theYear + '%'

) AS u

GROUP BY u.StafName,

u.theIndex

ORDER BY u.StafName,

u.theIndex

DROP TABLE #Stage

-- Now we display the time for Pesos suggestion

PRINT 'Peso time taken ' + CONVERT(varchar(20), DATEDIFF(MILLISECOND, @Time, GETDATE())) + ' milliseconds.'

Well.. How can I write this in a nice way?

I run my code 60 times and the average time on my computer is 2200 milliseconds (yes 2.2 seconds only!)

The average reads were 115000.

During this time, Sergiys suggestion is still running the first time. I promise, I will get back and post the time taken and average reads.

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 2

I have to break my promise.

After over 30 minutes for first run of Sergiys suggestion I had to cancel the query.

My profiler then showed my this result for Sergiy

Duration - 1,874,907

Reads - 10,773,782

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 3

I took the liberty to optimize Sergiys suggestion a little.

Two things.

SELECT @N = MAX (CNT)

FROM (select COUNT(*) CNT

FROM TestData

GROUP BY StaffName , year(reviewdate)

) DT

and

INNER JOIN Tally T ON T.N > 0 AND T.N <= @N

not much improvement. Only 10%.

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 4

I did some test with SET STATISTICS IO ON and got this result for original 15 records

Sergiy

Table 'TestData'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 2, logical reads 48, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Tally'. Scan count 2, logical reads 6, physical reads 1, read-ahead reads 7, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#336AA144'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#318258D2'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#2F9A1060'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Sergyi time taken 63 milliseconds.

Peso

Table '#Stage'. Scan count 0, logical reads 31, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'TestData'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Stage'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Peso time taken 46 milliseconds.

My statistics io result for 512000 records of sample data is

Table 'TestData'. Scan count 3, logical reads 1543, physical reads 0, read-ahead reads 16, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Stage'. Scan count 0, logical reads 111250, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Stage'. Scan count 2, logical reads 424, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Peso time taken 2250 milliseconds.

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 5

And just for the fun of it (to prove Sergiy wrong about my original suggestion with the hidden triangular join which he didn't like and wasn't afraid to tell), I timetested that too.

With 40 runs, the time taken averaged at 12.1 seconds and averaged at 820000 reads.

Still hell a lot better than Sergiys suggestion with tally table, function and join hints!

The statistics io result were for my original suggestion and 512000 sample records

Table 'TestData'. Scan count 55103, logical reads 295419, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 131558, logical reads 520202, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 6

I tried to rerun Sergiys suggestion, and this time my server died after 2 hours and 7 minutes.

There were no longer space to expand my tempdb!

sp_spaceused testdata

namerowsreserveddataindex_sizeunused

TestData512000 11272 KB11160 KB80 KB32 KB

When checking the size of TempDB i realized it was now 124 000 MB !

For a table of 11 MB size.

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 7

I now come to the conclusion I must create less sample data, so I started with 4000 records

INSERTTestData

(

StaffName,

ReviewDate

)

SELECTsn.Number,

36524 + ABS(CHECKSUM(NEWID())) % 7304

FROMmaster..spt_values AS sn

INNER JOINmaster..spt_values AS rd ON rd.Type = 'p'

INNER JOINmaster..spt_values AS y ON y.Type = 'p'

WHEREsn.Type = 'p'

AND sn.Number < 20

AND rd.Number < 20

AND y.Number < 10

Now Sergiys suggestion finalized with an average of 5.1 s and 15000 reads.

My suggestion averaged at 0.2 s and 1600 reads.

Then I went on to create 18000 sample records like thisINSERTTestData

(

StaffName,

ReviewDate

)

SELECTsn.Number,

36524 + ABS(CHECKSUM(NEWID())) % 7304

FROMmaster..spt_values AS sn

INNER JOINmaster..spt_values AS rd ON rd.Type = 'p'

INNER JOINmaster..spt_values AS y ON y.Type = 'p'

WHEREsn.Type = 'p'

AND sn.Number < 30

AND rd.Number < 30

AND y.Number < 20

Now Sergiys suggestion averaged with an average of 120 s and 143000 reads.

My suggestion averaged at 0.3 seconds and 5600 reads.

As you can see, Sergiys suggestion doesnt scale well. Far from.

4.5 times the sample data (from 4000 to 18000) makes his suggestion need 9.5 number of reads and 24 times the time taken.

And with 512000 (128 times more compared to 4000 records) sample records my suggestion did take 115000 reads and 2.2 seconds.

That is 72 times more reads only and 88 times the time taken.

And with 512000 (7 times more compared to 18000 records) sample records my suggestion did take 115000 reads and 2.2 seconds.

That is 20 times more reads only and 7 times the time taken.

So my suggestion scales very well! In practical my suggestion scales linear. Double the sample data, double the time!

What if we could calculate the time taken for Sergiy with 512000 sample records?

To do this we need a calculatable set of sample data. I created this with

INSERTTestData

(

StaffName,

ReviewDate

)

SELECTsn.Number,

36524 + ABS(CHECKSUM(NEWID())) % 7304

FROMmaster..spt_values AS sn

INNER JOINmaster..spt_values AS rd ON rd.Type = 'p'

INNER JOINmaster..spt_values AS y ON y.Type = 'p'

WHEREsn.Type = 'p'

AND sn.Number < 50

AND rd.Number < 40

AND y.Number < 20

That is double the number of sample records (40000 records and that made tempdb expand to 2300 MB!). And as I suspected Sergiys suggestion now did take 308 s and used 400000 reads.

That is 40000 sample records (2 times more compared to 18000 records).

Which is 2.5 times more time taken and 3 times more used reads.

For a sample set of 512000 records, Sergiys suggestion would take about 23 hours...

Now I just wait to hear from Sergiy and he tells me I am using "a ridicolus large amount of sample data".

Peso algorithm

Sample recordsTime takenUsed reads

---------------------------------------

4,000 0.2 s 1,600 reads

18,000 0.3 s 5,600 reads

512,000 2.2 s 115,000 reads

1,600,000 5.6 s 235,000 reads(same time as Sergiy had, but for 4,000 records only)

40,000,000 126.0 s3,266,000 reads(same time as Sergiy had, but for 18,000 records only)

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 8

Sergiy, I think you are reasonable skilled at many things for Microsoft SQL Server.

But in this specific topic I think I have proved that this is not one of your strengths.

N 56°04'39.16"
E 12°55'05.25"

Mike Nuessler Right there with Babe Points: 788 More actions · Answer 9

I think Peter has proven his case here..

Thank you to everyone for suggestions; in the long run every one is valuable to improve my SQL skills.

Cheers, Mike

In the beginning the Universe was created. This has made a lot of people very angry and has been widely regarded as a bad move. Douglas Adams (1952-2001)