Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Finding the most recent value for many record in a detail table (optimizing)


Finding the most recent value for many record in a detail table (optimizing)

Author
Message
ShawnTherrien
ShawnTherrien
SSC Journeyman
SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)

Group: General Forum Members
Points: 75 Visits: 765
My system has a number of history tables that would give the date that a value has changed.

I found a very close analogue to this in Adventure works looking up products and seeing what the most recent value was as of a certain date.

I can think of two ways to do this:


declare @ReportingDate DATE = '9/7/2004'

-- Method 1
SELECT
Name
,ReportDate = @ReportingDate
,MostRecentOrderDetailId = (SELECT TOP 1 SalesOrderDetailId FROM sales.SalesOrderDetail SalesOrderDetail WHERE SalesOrderDetail.ProductID = Product.ProductID AND SalesOrderDetail.ModifiedDate <= @ReportingDate ORDER BY ModifiedDate DESC)
FROM Production.Product Product

-- Method 2
SELECT
Name
,ReportDate = @ReportingDate
,MostRecent.MostRecentOrderDetailId
FROM Production.Product Product
OUTER APPLY (SELECT TOP 1 MostRecentOrderDetailId = SalesOrderDetailId FROM sales.SalesOrderDetail SalesOrderDetail WHERE SalesOrderDetail.ProductID = Product.ProductID AND SalesOrderDetail.ModifiedDate <= @ReportingDate ORDER BY ModifiedDate DESC) MostRecent



Both produce a near identical query plan. Method 2 would be a clear winner of the two if I wanted to retrieve more than a single value from the detail table (say price and quantity).

Is there a better way to do this overall without refactoring how the data is stored?

Best performance I can think of would be to have a daily reporting table that precalculates the requested values cross applied with a dates table to give the value for every date. Then it would be a simple join rather than a subquery/cross apply.

Even using a reporting table (DailySalesOrderDetail), these queries need to be optimized because they'll be needed to recalculate the the reporting whenever the SalesOrderDetail table changes.


IF OBJECT_ID('tempdb..#DailyOrderDetail') IS NOT NULL DROP TABLE #DailyOrderDetail

SELECT DateFull, Name, MostRecentOrderDetailId
INTO #DailyOrderDetail
FROM edimain.dbo.DateLookup
CROSS APPLY (SELECT
Name
,MostRecentOrderDetailId = (SELECT TOP 1 SalesOrderDetailId FROM sales.SalesOrderDetail SalesOrderDetail WHERE SalesOrderDetail.ProductID = Product.ProductID AND SalesOrderDetail.ModifiedDate <= DateFull ORDER BY ModifiedDate DESC)
FROM Production.Product Product) MostRecent
WHERE DateFull BETWEEN '1/1/2002' and '1/1/2005'

CREATE NONCLUSTERED INDEX IX_DailyOrderDetail_DateFull
ON #DailyOrderDetail ([DateFull])
INCLUDE ([Name],[MostRecentOrderDetailId])

declare @ReportingDate DATE = '9/7/2004'

-- Method 3
SELECT
d.Name
,ReportDate = @ReportingDate
,MostRecentOrderDetailId = d.MostRecentOrderDetailId
FROM Production.Product Product
INNER JOIN #DailyOrderDetail D on D.DateFull = @ReportingDate AND D.Name = Product.Name



Given the reporting table already existing, Method 3 (just a join to a denormalized reporting table) doesn't even register as a single percentage point in comparison to Method 1 and Method 2.

So if I could get Method 1 or Method 2 optimized further, that would be an interesting exercise. My application uses it quite a bit an it would be useful for optimizing a reporting table to use Method 3 as well.

Is there a term for what sort of query this is?
Attachments
QP.jpg (5 views, 242.00 KB)
TheSQLGuru
TheSQLGuru
SSCertifiable
SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)SSCertifiable (6K reputation)

Group: General Forum Members
Points: 5970 Visits: 8313
tl;dr

1) I think pre-aggregated data will always smoke real-time queries :-)

2) since you are on SQL 2012, have a look at LAST_VALUE. It might be a better option than your TOP 1 subquery. NOTE: the default window clause (RANGE BETEEN...) is HORRIBLY inefficient currently. Be sure to use ROWS BETWEEN...

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
ShawnTherrien
ShawnTherrien
SSC Journeyman
SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)

Group: General Forum Members
Points: 75 Visits: 765
The First_Value/Last_Value sounds really promising, but I'm not quite getting it yet.


-- Method 1.5a
SELECT DISTINCT
Product.Name
,ReportDate = @ReportingDate
,MostRecentOrderDetailId = (SELECT LAST_VALUE(SalesOrderDetailId) OVER (PARTITION BY Product.ProductId order by SalesOrderDetail.ModifiedDate)
FROM sales.SalesOrderDetail SalesOrderDetail WHERE SalesOrderDetail.ProductID = product.ProductID
AND SalesOrderDetail.ModifiedDate <= @ReportingDate)
FROM Production.Product Product


-- Method 1.5b
SELECT DISTINCT
Product.Name
,ReportDate = @ReportingDate
,LAST_VALUE(SalesOrderDetailId) OVER (PARTITION BY Product.ProductId order by SalesOrderDetail.ModifiedDate)
FROM Production.Product Product
LEFT JOIN sales.SalesOrderDetail SalesOrderDetail ON SalesOrderDetail.ProductID = product.ProductID



1.5a returns the fun error:
[color=RRGGBB]Msg 512, Level 16, State 1, Line 11
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.[/color]

When I try to add grouping to either, the window function I get this error:
[color=#RRGGBB]Column 'sales.SalesOrderDetail.ModifiedDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.[/color]

For every product I'm trying to get a single value (the Last_Value before a date), but I seem to be doing rather poorly at using Last_Value and getting it to group or return just a single value.

Any advice on what I'm overlooking here?
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10338 Visits: 11350
This is a variation of the 'Top N per Group' problem, and there are many ways to express it in T-SQL. Whichever you choose, be sure to make the ORDER BY clause deterministic and provide a useful index. A deterministic ORDER BY simply means there should be no ties. In the AdventureWorks example, TOP (1) ... ORDER BY ModifiedDate DESC is not deterministic per product because multiple order detail rows can have the same ModifiedDate. A useful index for that query is:

CREATE INDEX nc1 
ON Sales.SalesOrderDetail
(
ProductID,
ModifiedDate DESC,
SalesOrderDetailID DESC
);



Writing the query to be deterministic:


SELECT
p.Name,
ReportDate = @ReportingDate,
MostRecent.SalesOrderDetailID
FROM Production.Product AS p
OUTER APPLY
(
SELECT TOP (1)
sod.SalesOrderDetailID
FROM Sales.SalesOrderDetail AS sod
WHERE
sod.ProductID = p.ProductID
AND sod.ModifiedDate <= @ReportingDate
ORDER BY
sod.ModifiedDate DESC,
sod.SalesOrderDetailID DESC
) AS MostRecent;



The query plan no longer features an index spool or sort, seeking straight to the required row per product:





Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Attachments
ssc.png (129 views, 22.00 KB)
Bhaskar.Shetty
Bhaskar.Shetty
Right there with Babe
Right there with Babe (778 reputation)Right there with Babe (778 reputation)Right there with Babe (778 reputation)Right there with Babe (778 reputation)Right there with Babe (778 reputation)Right there with Babe (778 reputation)Right there with Babe (778 reputation)Right there with Babe (778 reputation)

Group: General Forum Members
Points: 778 Visits: 509
How about this solutions...


--- Create a InLine Function with ReportingDate Parameter --
CREATE FUNCTION fnMostRecentOrderDetail(@ReportingDate datetime) RETURNS TABLE
AS
RETURN
(
SELECT ProductId, SalesOrderDetailId, ModifiedDate,
ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY ModifiedDate DESC) RowNo
FROM sales.SalesOrderDetail
WHERE ModifiedDate < @ReportingDate
)
GO



and then...


-- Join the function with ReportingDate Parameter --
declare @ReportingDate DATE = '9/7/2007'
SELECT
Name
,ReportDate = @ReportingDate
,MostRecent.ModifiedDate
FROM Production.Product Product
LEFT OUTER JOIN fnMostRecentOrderDetail(@ReportingDate) MostRecent ON Product.ProductID = MostRecent.ProductId and RowNo = 1


T.Ashish
T.Ashish
Right there with Babe
Right there with Babe (767 reputation)Right there with Babe (767 reputation)Right there with Babe (767 reputation)Right there with Babe (767 reputation)Right there with Babe (767 reputation)Right there with Babe (767 reputation)Right there with Babe (767 reputation)Right there with Babe (767 reputation)

Group: General Forum Members
Points: 767 Visits: 582
Add index suggested in Paul's reply, to improve performance of your query.
ShawnTherrien
ShawnTherrien
SSC Journeyman
SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)

Group: General Forum Members
Points: 75 Visits: 765
Paul White (5/24/2013)
This is a variation of the 'Top N per Group' problem, and there are many ways to express it in T-SQL. Whichever you choose, be sure to make the ORDER BY clause deterministic and provide a useful index...


Thank you for the correction! yes, on the query in my system that looks into history is deterministic, my AdventureWorks example wasn't as well thought out as it could be Smile

More importantly giving a good example of what the right index should get you in a query plan
ShawnTherrien
ShawnTherrien
SSC Journeyman
SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)

Group: General Forum Members
Points: 75 Visits: 765
Bhaskar.Shetty (5/24/2013)
How about this solutions...

>>SQL Removed<<


Hmm, thank you. I hadn't quite thought of that as a third option to the Sub query and cross apply. In fact if I'd made a ITF, I think my mind would have drifted towards a cross apply

I like how it's clean and I'm interested to see how it the RowNumber compares to the Top 1 performance wise. I'll post a followup comparison Smile
ShawnTherrien
ShawnTherrien
SSC Journeyman
SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)SSC Journeyman (75 reputation)

Group: General Forum Members
Points: 75 Visits: 765
Bhaskar.Shetty (5/24/2013)
How about this solutions...


--- Create a InLine Function with ReportingDate Parameter --
CREATE FUNCTION fnMostRecentOrderDetail(@ReportingDate datetime) RETURNS TABLE
AS
RETURN
(
SELECT ProductId, SalesOrderDetailId, ModifiedDate,
ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY ModifiedDate DESC) RowNo
FROM sales.SalesOrderDetail
WHERE ModifiedDate < @ReportingDate
)
GO



and then...


-- Join the function with ReportingDate Parameter --
declare @ReportingDate DATE = '9/7/2007'
SELECT
Name
,ReportDate = @ReportingDate
,MostRecent.ModifiedDate
FROM Production.Product Product
LEFT OUTER JOIN fnMostRecentOrderDetail(@ReportingDate) MostRecent ON Product.ProductID = MostRecent.ProductId and RowNo = 1




I have a solution very similar now.

Currently it's a inline table function that takes a product id and date, then returns the one reference back to the record.

It uses the Row_Number() ORDER BY... where RowNumber = 1, but inside of the function, so it always just returns a single row.

Currently it's not in the AdventureWorks db, it's in my actual db. But I'm interested in this sort of thing, so I'll probobly post up here the three different methods I'm seeing and the differences. AdventureWorks might be too small to really differentiate the performance, but we'll see.

We have the Top 1 Desc (starting), Return Row_Number 1 for a very specific item (my new) and Return Row_Numbers and join on 1 to limit the results (actually returns multiple) that you gave me.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search