Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1112131415»»

Linking to the Previous Row Expand / Collapse
Author
Message
Posted Saturday, November 13, 2010 6:36 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:14 AM
Points: 36,735, Visits: 31,186
David McKinney (11/13/2010)
I meant that specifically the different approaches to the "previous / next row" problem haven't been compared side by side.


Exactly. Here's code to build a million row table with 5000 ItemID's and around 200 price changes for each item over a ten year period. Let the races begin. In the mean time, no claims of performance should be made because it hasn't been proven on THIS thread.

--===== Conditionally drop and rebuild a test table to make reruns easier.
-- This whole thing takes about 35 seconds to run.
IF OBJECT_ID('tempdb..#PriceHistory','U') IS NOT NULL
DROP TABLE #PriceHistory
;
GO

WITH
cteCreateData AS
(
SELECT TOP 1050000
ItemID = CAST(ABS(CHECKSUM(NEWID()))%5000+1 AS INT),
PriceStartDate = DATEADD(dd,ABS(CHECKSUM(NEWID()))%(DATEDIFF(dd,'2010','2020')),'2010'),
Price = CAST((ABS(CHECKSUM(NEWID()))%1000)/100.0 +1 AS DECIMAL(9,2))
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
)
,
cteDeleteDupes AS
(
SELECT Occurance = ROW_NUMBER() OVER (PARTITION BY ItemID, PriceStartDate ORDER BY ItemID, PriceStartDate),
ItemID, PriceStartDate, Price
FROM cteCreateData
)
SELECT TOP 1000000
ItemID, PriceStartDate, Price
INTO #PriceHistory
FROM cteDeleteDupes
WHERE Occurance = 1
;



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1020387
Posted Tuesday, November 16, 2010 6:39 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: 2 days ago @ 9:09 AM
Points: 645, Visits: 1,844
Excellent...thanks, Jeff, for getting the ball rolling.

I adapted your code to make it as per the example i.e. so that the article code would run without modif.
Specifically, I created the PriceHistory table upfront with PK as per the article, and also created an Items table with a foreign key constraint between the two. (Also I created these as non-temporary tables - for no good reason.) At the end of the code block below is the CREATE VIEW exactly as in the article.

There are subsequent code blocks to show other methods.

I'm not finished yet...I just wanted to post some code before someone beat me to it
The next steps are the following -
1) post an example of the cross apply method.
2) benchmark each method for different use cases.

I propose the following use cases
a) Entire table (as per select * FROM PriceCompare in the article.)
b) History for a particular item e.g.WHERE Item='Item 512')
c) Identify all increases in price. e.g. WHERE RangePrice>OldPrice.

While I haven't done any benchmarking in earnest, I reckon for now that the CTE is holding its own on the million row dataset. But maybe with some appropriate indexing etc. the others may prove better.

Regards,

David McKinney.


IF  EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_PriceHistory_Items]') AND parent_object_id = OBJECT_ID(N'[dbo].[PriceHistory]'))
ALTER TABLE [dbo].[PriceHistory] DROP CONSTRAINT [FK_PriceHistory_Items]
GO

IF OBJECT_ID('PriceHistory','U') IS NOT NULL
DROP TABLE PriceHistory
;
GO
IF OBJECT_ID('Items','U') IS NOT NULL
DROP TABLE Items
;
GO

CREATE TABLE [dbo].[Items](
[ItemId] [int] NOT NULL,
[Item] [varchar](100) NOT NULL,
CONSTRAINT [PK_Items] PRIMARY KEY CLUSTERED
(
[ItemId] ASC
))
GO

CREATE TABLE [dbo].[PriceHistory](
[ItemId] [int] NOT NULL,
[PriceStartDate] [datetime] NOT NULL,
[Price] [decimal](10, 2) NOT NULL,
CONSTRAINT [PK_PriceHistory] PRIMARY KEY CLUSTERED
(
[ItemId] ASC,
[PriceStartDate] ASC
))
GO

;WITH
cteCreateData AS
(
SELECT TOP 1050000
ItemID = CAST(ABS(CHECKSUM(NEWID()))%5000+1 AS INT),
PriceStartDate = DATEADD(dd,ABS(CHECKSUM(NEWID()))%(DATEDIFF(dd,'2010','2020')),'2010'),
Price = CAST((ABS(CHECKSUM(NEWID()))%1000)/100.0 +1 AS DECIMAL(9,2))
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
)
,
cteDeleteDupes AS
(
SELECT Occurance = ROW_NUMBER() OVER (PARTITION BY ItemID, PriceStartDate ORDER BY ItemID, PriceStartDate),
ItemID, PriceStartDate, Price
FROM cteCreateData
)
INSERT INTO PriceHistory ( ItemID, PriceStartDate, Price)


SELECT TOP 1000000
ItemID, PriceStartDate, Price
FROM cteDeleteDupes
WHERE Occurance = 1
;
GO

INSERT INTO dbo.Items(ItemId,Item)

select ItemID, 'Item ' + CAST(ItemID as varchar) FROM PriceHistory
GROUP BY ItemID
GO

ALTER TABLE [dbo].[PriceHistory] WITH CHECK ADD CONSTRAINT [FK_PriceHistory_Items] FOREIGN KEY([ItemId])
REFERENCES [dbo].[Items] ([ItemId])
GO

/****** Object: View [dbo].[PriceCompare] Script Date: 11/16/2010 14:21:14 ******/
IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[PriceCompare]'))
DROP VIEW [dbo].[PriceCompare]
GO


CREATE VIEW [dbo].[PriceCompare] AS
WITH PriceCompare AS
(
SELECT i.Item, ph.ItemId, ph.PriceStartDate, ph.Price,
ROW_NUMBER() OVER (Partition BY ph.ItemId ORDER BY PriceStartDate) AS rownum
FROM
Items i
INNER JOIN
PriceHistory ph
ON i.ItemId = ph.ItemId
)

SELECT
currow.Item,
prevrow.Price AS OldPrice,
currow.Price AS RangePrice,
currow.PriceStartDate AS StartDate,
nextrow.PriceStartDate AS EndDate
FROM
PriceCompare currow
LEFT JOIN PriceCompare nextrow
ON currow.rownum = nextrow.rownum - 1 AND currow.ItemId = nextrow.ItemId
LEFT JOIN PriceCompare prevrow
ON currow.rownum = prevrow.rownum + 1 AND currow.ItemId = prevrow.ItemId

The next code block shows the creation of a temporary table exactly equivalent to PriceCompare CTE in the View. The temporary table is populated once, and then joined with itself.

-- Replacing the CTE with an entirely equivalent temporary table.


IF OBJECT_ID('tempdb.dbo.#NumberedPriceHistory') is not null
drop table #NumberedPriceHistory
GO

CREATE TABLE #NumberedPriceHistory
(
PRIMARY KEY (ItemId,rownum) ,
Item varchar(100),
ItemId int,
PriceStartDate datetime,
Price decimal(9,2),
rownum int
)
GO

--CREATE NONCLUSTERED INDEX idxNumberedPriceHistory_Item
--ON [dbo].[#NumberedPriceHistory] ([Item])
--INCLUDE ([ItemId],[PriceStartDate],[Price],[rownum])
--GO

INSERT INTO #NumberedPriceHistory
SELECT i.Item, ph.ItemId, ph.PriceStartDate, ph.Price,
ROW_NUMBER() OVER (Partition BY ph.ItemId ORDER BY PriceStartDate) AS rownum
FROM Items i INNER JOIN PriceHistory ph
ON i.ItemId = ph.ItemId
GO

SELECT currow.Item, prevrow.Price AS OldPrice, currow.Price AS RangePrice, currow.PriceStartDate AS StartDate, nextrow.PriceStartDate AS EndDate
FROM #NumberedPriceHistory currow
LEFT JOIN #NumberedPriceHistory nextrow
ON currow.rownum = nextrow.rownum - 1
AND currow.ItemId = nextrow.ItemId
LEFT JOIN #NumberedPriceHistory prevrow
ON currow.rownum = prevrow.rownum + 1
AND currow.ItemId = prevrow.ItemId
where currow.Price>prevrow.Price
--where currow.Item='Item 512'

The third code block shows a temp table but this time with an identity column to replace the row number. The insert into this table is sorted - is that OK?

-- Replacing the CTE with a table with an identity column

IF OBJECT_ID('tempdb.dbo.#PriceHistoryWithIdentity') is not null
drop table #PriceHistoryWithIdentity
GO

CREATE TABLE #PriceHistoryWithIdentity
(
PRIMARY KEY (rownum),
rownum int identity(1,1),
Item varchar(100),
ItemId int,
PriceStartDate datetime,
Price decimal(9,2)
)
GO

INSERT INTO #PriceHistoryWithIdentity
(Item,
ItemId,
PriceStartDate,
Price
)
SELECT i.Item, ph.ItemId, ph.PriceStartDate, ph.Price
FROM Items i INNER JOIN PriceHistory ph
ON i.ItemId = ph.ItemId
order by i.Item, ph.PriceStartDate
GO

SELECT currow.Item, prevrow.Price AS OldPrice, currow.Price AS RangePrice, currow.PriceStartDate AS StartDate, nextrow.PriceStartDate AS EndDate
FROM #PriceHistoryWithIdentity currow
LEFT JOIN #PriceHistoryWithIdentity nextrow
ON currow.rownum = nextrow.rownum - 1
AND currow.ItemId = nextrow.ItemId
LEFT JOIN #PriceHistoryWithIdentity prevrow
ON currow.rownum = prevrow.rownum + 1
AND currow.ItemId = prevrow.ItemId
where currow.Price>prevrow.Price

Post #1021453
Posted Thursday, November 18, 2010 4:16 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: 2 days ago @ 9:09 AM
Points: 645, Visits: 1,844
ok...so here are my results in csv format (sorry!) Conclusions follow.

Full Resultset,Method,Execution Time,Client Processing Time,Total execution Time,Wait Time,Notes
,CTE View,13 secs,3261,5759,2498,93% in Hash Match
,Temp Table with rownumber,33 secs,21524,25422,3898,Index Scans
,Table Variable with rownumber,47 secs,36798,39196,2398,
,Temp Table with identity,16 secs,3269,8753,5484,Insert 81% Select 19%
,Table Variable with identity,Cancelled after 1 hour,,,,
,Cross Apply,12 secs,3795,4066,271,
,,,,,,
One Item,,,,,,
,CTE View,2 secs,376,2179,1803,
,Temp Table with rownumber,5 secs,170,5039,4869,Insert 95%
,Table Variable with rownumber,4 secs,68,3380,3312,
,Temp Table with identity,6 secs,62,6585,6523,I 86% / S 14%
,Table Variable with identity,83 secs,78230,82767,4537,Insert = 100%!
,Cross Apply,0 sesc,14,139,125,Wow!
,,,,,,
Price Rises,,,,,,
,CTE View,8 secs,1586,4357,2771,
,Temp Table with rownumber,19 secs,11213,15927,4714,
,Table Variable with rownumber,17 secs,10231,12995,2764,Insert 100%
,Temp Table with identity,12 secs,2138,8066,5928,Insert 83% Select 17%
,Table Variable with identity,Not Run,,,,
,Cross Apply,8 secs,3857,4111,254,

(The cross apply code was exactly as previously posted by Charles Gildawie.)

I ran the code on

Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (Intel X86) Apr 2 2010 15:53:02 Copyright (c) Microsoft Corporation Developer Edition on Windows NT 6.1 <X86> (Build 7600: )

Before running each code block I executed the following, aiming to ensure that no caching was taking place.

checkpoint
dbcc freeproccache
dbcc dropcleanbuffers

My conclusions are
1) that the Cross Apply solution comes out a clear winner when searching for an individual Item (although the CTE has very acceptable performance relative to the other solutions.)
2) that any solutions using a table variable (as opposed to a temp table) do badly when using a identity column. Are they very slow to create identity values? In any case the equivalent code with row_number does much better.

Final Conclusion (for now!)

The Cross Apply and the CTE solutions stand clearly out from the bunch with the cup going to cross apply for it's performance on an individual item, and a special mention for the CTE 'cos it's much prettier

Next Steps?
Perf testing isn't my speciality; perhaps you can find fault with my approach. Or perhaps the code used to illustrate each method could be improved e.g. by indexing.

I haven't talked about the execution plans. I'll leave that perhaps to one better versed.

I hope this enlightens more than one, and goes someway towards ending speculation on which approach would scale better etc..

Regards,

David McKinney.
Post #1022742
Posted Thursday, November 18, 2010 6:27 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, April 27, 2011 8:16 AM
Points: 13, Visits: 64
That's very interesting David - thank you for investigating. Now to try and understand... :)
Post #1022836
Posted Monday, January 3, 2011 1:51 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, April 27, 2011 8:16 AM
Points: 13, Visits: 64
Just wanted to return to say I tried using Outer Apply as described and it was indeed faster for me too. Thanks all for your help.
Post #1042059
Posted Tuesday, April 12, 2011 10:04 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Monday, December 24, 2012 11:32 AM
Points: 56, Visits: 880
Can you demonstrate your technique for matching current rows with previous and next values is faster and more efficient as my best solution?

See:
CREATE TABLE [Sample]
(
seq_nbr INTEGER NOT NULL PRIMARY KEY
);

INSERT [Sample]
VALUES (1), (4), (5), (8);

/* Wanted Result
seq_nbr seq_nbr seq_nbr
----------- ----------- -----------
NULL 1 4
1 4 5
4 5 8
5 8 NULL
*/

--Mohammad Salimabadi Solution
SET STATISTICS IO ON
SET STATISTICS TIME ON
;WITH C AS
(
SELECT seq_nbr, k,
DENSE_RANK() OVER(ORDER BY seq_nbr ASC) + k AS grp_fct
FROM [Sample]
CROSS JOIN
(VALUES (-1), (0), (1)
) AS D(k)
)
SELECT MIN(seq_nbr) AS pre_value,
MAX(CASE WHEN k = 0 THEN seq_nbr END) AS current_value,
MAX(seq_nbr) AS next_value
FROM C
GROUP BY grp_fct
HAVING min(seq_nbr) < max(seq_nbr);

--Common Solution #1
WITH C AS
(SELECT seq_nbr,
ROW_NUMBER() OVER(ORDER BY seq_nbr) AS rnk
FROM [Sample]
)
SELECT B.seq_nbr,
A.seq_nbr,
C.seq_nbr
FROM C AS A
LEFT JOIN C AS B
ON A.rnk - 1 = B.rnk
LEFT JOIN C
ON A.rnk = C.rnk - 1;

--Common Solution #2
SELECT D1.seq_nbr,
S.seq_nbr,
D2.seq_nbr
FROM [Sample] AS S
OUTER APPLY (
SELECT TOP 1 seq_nbr
FROM [Sample]
WHERE seq_nbr < S.seq_nbr
ORDER BY seq_nbr DESC
) AS D1
OUTER APPLY (
SELECT TOP 1 seq_nbr
FROM [Sample]
WHERE seq_nbr > S.seq_nbr
ORDER BY seq_nbr
) AS D2;

SET STATISTICS IO OFF
SET STATISTICS TIME OFF

/*

Mohammad Salimabadi Solution:
Table 'Sample'. Scan count 1, logical reads 2

Common Solution #1
Table 'Sample'. Scan count 3, logical reads 20

Common Solution #2
Table 'Sample'. Scan count 9, logical reads 18

*/

Post #1092206
Posted Tuesday, April 12, 2011 12:20 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, April 27, 2011 8:16 AM
Points: 13, Visits: 64
Hi Mohammad. I can run this on my data - right now it looks like it's picking out the current, previous and next 'date' or seq-nbr? What I'd need is the current, next and previous 'value', from those dates. Any chance of a tweak?

A quick run of the code as is suggests it is much faster than the CTE and faster than the Outer Apply.
Post #1092327
Posted Tuesday, April 12, 2011 12:40 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, December 6, 2013 8:07 AM
Points: 109, Visits: 489

;WITH C AS
(
SELECT seq_nbr, k,
DENSE_RANK() OVER(ORDER BY seq_nbr ASC) + k AS grp_fct
FROM [Sample]
CROSS JOIN
(VALUES (-1), (0), (1)
) AS D(k)
)



I'm curious about your choice to use DENSE_RANK (which is identical to ROW_NUMBER and RANK when all rows are unique, but which I believe won't work when there are duplicates).

Is DENSE_RANK any faster? And if so, why? I believe your technique should still work fine with ROW_NUMBER.

I supposed that the real speed up of this version comes from the up front CROSS JOIN which mitigates the tendency of the engine to treat multiple uses of CTE which is self-joined independently resulting in multiple reads.
Post #1092353
Posted Tuesday, April 12, 2011 1:22 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Monday, December 24, 2012 11:32 AM
Points: 56, Visits: 880
>>
Hi Mohammad. I can run this on my data - right now it looks like it's picking out the current, previous and next 'date' or seq-nbr? What I'd need is the current, next and previous 'value', from those dates. Any chance of a tweak?

A quick run of the code as is suggests it is much faster than the CTE and faster than the Outer Apply.
<<

No problem,
Try this on:
DECLARE @Sample TABLE
(
seq_nbr INTEGER NOT NULL PRIMARY KEY,
value INTEGER
);

INSERT @Sample
VALUES (1, 5), (4, 9), (5, 8), (8, 20);

;WITH C AS
(
SELECT seq_nbr, value, k,
DENSE_RANK() OVER(ORDER BY seq_nbr ASC) + k AS grp_fct
FROM @Sample
CROSS JOIN
(VALUES (-1), (0), (1)
) AS D(k)
)
SELECT MAX(CASE WHEN k = 1 THEN value END) AS pre_value,
MIN(CASE WHEN k = 1 THEN seq_nbr END) AS pre_seq,
MAX(CASE WHEN k = 0 THEN value END) AS current_value,
MAX(CASE WHEN k= 0 THEN seq_nbr END) AS current_seq,
MAX(CASE WHEN k = -1 THEN value END) AS next_value,
MAX(CASE WHEN k = -1 THEN seq_nbr END) AS next_seq
FROM C
GROUP BY grp_fct
HAVING min(seq_nbr) < max(seq_nbr);

/*
pre_value pre_seq current_value current_seq next_value next_seq
----------- ----------- ------------- ----------- ----------- -----------
NULL NULL 5 1 9 4
5 1 9 4 8 5
9 4 8 5 20 8
8 5 20 8 NULL NULL
*/

Post #1092391
Posted Tuesday, April 12, 2011 1:28 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Monday, December 24, 2012 11:32 AM
Points: 56, Visits: 880
Cade Roux (4/12/2011)

;WITH C AS
(
SELECT seq_nbr, k,
DENSE_RANK() OVER(ORDER BY seq_nbr ASC) + k AS grp_fct
FROM [Sample]
CROSS JOIN
(VALUES (-1), (0), (1)
) AS D(k)
)



I'm curious about your choice to use DENSE_RANK (which is identical to ROW_NUMBER and RANK when all rows are unique, but which I believe won't work when there are duplicates).

Is DENSE_RANK any faster? And if so, why? I believe your technique should still work fine with ROW_NUMBER.

I supposed that the real speed up of this version comes from the up front CROSS JOIN which mitigates the tendency of the engine to treat multiple uses of CTE which is self-joined independently resulting in multiple reads.


No, if we change the DENSE_RANK with ROW_NUMBER the result will not be correct. You can rum my first script at my first post with row_number. So you will see an empty result set.
Post #1092395
« Prev Topic | Next Topic »

Add to briefcase «««1112131415»»

Permissions Expand / Collapse