Finding the most recent value for many record in a detail table (optimizing)

Question

Post reply

Finding the most recent value for many record in a detail table (optimizing)

ShawnTherrien

Default port

Points: 1481
More actions
May 23, 2013 at 10:00 am

#278107

My system has a number of history tables that would give the date that a value has changed.
I found a very close analogue to this in Adventure works looking up products and seeing what the most recent value was as of a certain date.
I can think of two ways to do this:
declare @ReportingDate DATE = '9/7/2004'
-- Method 1
SELECT
Name
,ReportDate = @ReportingDate
,MostRecentOrderDetailId = (SELECT TOP 1 SalesOrderDetailId FROM sales.SalesOrderDetail SalesOrderDetail WHERE SalesOrderDetail.ProductID = Product.ProductID AND SalesOrderDetail.ModifiedDate <= @ReportingDate ORDER BY ModifiedDate DESC)
FROM Production.Product Product
-- Method 2
SELECT
Name
,ReportDate = @ReportingDate
,MostRecent.MostRecentOrderDetailId
FROM Production.Product Product
OUTER APPLY (SELECT TOP 1 MostRecentOrderDetailId = SalesOrderDetailId FROM sales.SalesOrderDetail SalesOrderDetail WHERE SalesOrderDetail.ProductID = Product.ProductID AND SalesOrderDetail.ModifiedDate <= @ReportingDate ORDER BY ModifiedDate DESC) MostRecent
Both produce a near identical query plan. Method 2 would be a clear winner of the two if I wanted to retrieve more than a single value from the detail table (say price and quantity).
Is there a better way to do this overall without refactoring how the data is stored?
Best performance I can think of would be to have a daily reporting table that precalculates the requested values cross applied with a dates table to give the value for every date. Then it would be a simple join rather than a subquery/cross apply.
Even using a reporting table (DailySalesOrderDetail), these queries need to be optimized because they'll be needed to recalculate the the reporting whenever the SalesOrderDetail table changes.
IF OBJECT_ID('tempdb..#DailyOrderDetail') IS NOT NULL DROP TABLE #DailyOrderDetail
SELECT DateFull, Name, MostRecentOrderDetailId
INTO #DailyOrderDetail
FROM edimain.dbo.DateLookup
CROSS APPLY (SELECT
Name
,MostRecentOrderDetailId = (SELECT TOP 1 SalesOrderDetailId FROM sales.SalesOrderDetail SalesOrderDetail WHERE SalesOrderDetail.ProductID = Product.ProductID AND SalesOrderDetail.ModifiedDate <= DateFull ORDER BY ModifiedDate DESC)
FROM Production.Product Product) MostRecent
WHERE DateFull BETWEEN '1/1/2002' and '1/1/2005'
CREATE NONCLUSTERED INDEX IX_DailyOrderDetail_DateFull
ON #DailyOrderDetail ([DateFull])
INCLUDE ([Name],[MostRecentOrderDetailId])
declare @ReportingDate DATE = '9/7/2004'
-- Method 3
SELECT
d.Name
,ReportDate = @ReportingDate
,MostRecentOrderDetailId = d.MostRecentOrderDetailId
FROM Production.Product Product
INNER JOIN #DailyOrderDetail D on D.DateFull = @ReportingDate AND D.Name = Product.Name
Given the reporting table already existing, Method 3 (just a join to a denormalized reporting table) doesn't even register as a single percentage point in comparison to Method 1 and Method 2.
So if I could get Method 1 or Method 2 optimized further, that would be an interesting exercise. My application uses it quite a bit an it would be useful for optimizing a reporting table to use Method 3 as well.
Is there a term for what sort of query this is?

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

tl;dr

1) I think pre-aggregated data will always smoke real-time queries 🙂

2) since you are on SQL 2012, have a look at LAST_VALUE. It might be a better option than your TOP 1 subquery. NOTE: the default window clause (RANGE BETEEN...) is HORRIBLY inefficient currently. Be sure to use ROWS BETWEEN...

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

ShawnTherrien Default port Points: 1481 More actions · Answer 2

The First_Value/Last_Value sounds really promising, but I'm not quite getting it yet.

-- Method 1.5a

SELECT DISTINCT

Product.Name

,ReportDate = @ReportingDate

,MostRecentOrderDetailId = (SELECT LAST_VALUE(SalesOrderDetailId) OVER (PARTITION BY Product.ProductId order by SalesOrderDetail.ModifiedDate)

FROM sales.SalesOrderDetail SalesOrderDetail WHERE SalesOrderDetail.ProductID = product.ProductID

AND SalesOrderDetail.ModifiedDate <= @ReportingDate)

FROM Production.Product Product

-- Method 1.5b

SELECT DISTINCT

Product.Name

,ReportDate = @ReportingDate

,LAST_VALUE(SalesOrderDetailId) OVER (PARTITION BY Product.ProductId order by SalesOrderDetail.ModifiedDate)

FROM Production.Product Product

LEFT JOIN sales.SalesOrderDetail SalesOrderDetail ON SalesOrderDetail.ProductID = product.ProductID

1.5a returns the fun error:

Msg 512, Level 16, State 1, Line 11

Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

When I try to add grouping to either, the window function I get this error:

Column 'sales.SalesOrderDetail.ModifiedDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

For every product I'm trying to get a single value (the Last_Value before a date), but I seem to be doing rather poorly at using Last_Value and getting it to group or return just a single value.

Any advice on what I'm overlooking here?

Paul White SSC Guru Points: 150467 More actions · Answer 3

This is a variation of the 'Top N per Group' problem, and there are many ways to express it in T-SQL. Whichever you choose, be sure to make the ORDER BY clause deterministic and provide a useful index. A deterministic ORDER BY simply means there should be no ties. In the AdventureWorks example, TOP (1) ... ORDER BY ModifiedDate DESC is not deterministic per product because multiple order detail rows can have the same ModifiedDate. A useful index for that query is:

CREATE INDEX nc1

ON Sales.SalesOrderDetail

(

ProductID,

ModifiedDate DESC,

SalesOrderDetailID DESC

);

Writing the query to be deterministic:

SELECT

p.Name,

ReportDate = @ReportingDate,

MostRecent.SalesOrderDetailID

FROM Production.Product AS p

OUTER APPLY

(

SELECT TOP (1)

sod.SalesOrderDetailID

FROM Sales.SalesOrderDetail AS sod

WHERE

sod.ProductID = p.ProductID

AND sod.ModifiedDate <= @ReportingDate

ORDER BY

sod.ModifiedDate DESC,

sod.SalesOrderDetailID DESC

) AS MostRecent;

The query plan no longer features an index spool or sort, seeking straight to the required row per product:

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Bhaskar.Shetty Hall of Fame Points: 3054 More actions · Answer 4

How about this solutions...

--- Create a InLine Function with ReportingDate Parameter --

CREATE FUNCTION fnMostRecentOrderDetail(@ReportingDate datetime) RETURNS TABLE

AS

RETURN

(

SELECT ProductId, SalesOrderDetailId, ModifiedDate,

ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY ModifiedDate DESC) RowNo

FROM sales.SalesOrderDetail

WHERE ModifiedDate < @ReportingDate

)

GO

and then...

-- Join the function with ReportingDate Parameter --

declare @ReportingDate DATE = '9/7/2007'

SELECT

Name

,ReportDate = @ReportingDate

,MostRecent.ModifiedDate

FROM Production.Product Product

LEFT OUTER JOIN fnMostRecentOrderDetail(@ReportingDate) MostRecent ON Product.ProductID = MostRecent.ProductId and RowNo = 1

T.Ashish SSCarpal Tunnel Points: 4248 More actions · Answer 5

Add index suggested in Paul's reply, to improve performance of your query.

ShawnTherrien Default port Points: 1481 More actions · Answer 6

Paul White (5/24/2013)
This is a variation of the 'Top N per Group' problem, and there are many ways to express it in T-SQL. Whichever you choose, be sure to make the ORDER BY clause deterministic and provide a useful index...

Thank you for the correction! yes, on the query in my system that looks into history is deterministic, my AdventureWorks example wasn't as well thought out as it could be 🙂

More importantly giving a good example of what the right index should get you in a query plan

ShawnTherrien Default port Points: 1481 More actions · Answer 7

Bhaskar.Shetty (5/24/2013)
How about this solutions...
>>SQL Removed<<

Hmm, thank you. I hadn't quite thought of that as a third option to the Sub query and cross apply. In fact if I'd made a ITF, I think my mind would have drifted towards a cross apply

I like how it's clean and I'm interested to see how it the RowNumber compares to the Top 1 performance wise. I'll post a followup comparison 🙂

ShawnTherrien Default port Points: 1481 More actions · Answer 8

Bhaskar.Shetty (5/24/2013)
How about this solutions...
--- Create a InLine Function with ReportingDate Parameter --
CREATE FUNCTION fnMostRecentOrderDetail(@ReportingDate datetime) RETURNS TABLE
AS
RETURN
(
SELECT ProductId, SalesOrderDetailId, ModifiedDate,
ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY ModifiedDate DESC) RowNo
FROM sales.SalesOrderDetail
WHERE ModifiedDate < @ReportingDate
)
GO
and then...
-- Join the function with ReportingDate Parameter --
declare @ReportingDate DATE = '9/7/2007'
SELECT
Name
,ReportDate = @ReportingDate
,MostRecent.ModifiedDate
FROM Production.Product Product
LEFT OUTER JOIN fnMostRecentOrderDetail(@ReportingDate) MostRecent ON Product.ProductID = MostRecent.ProductId and RowNo = 1

I have a solution very similar now.

Currently it's a inline table function that takes a product id and date, then returns the one reference back to the record.

It uses the Row_Number() ORDER BY... where RowNumber = 1, but inside of the function, so it always just returns a single row.

Currently it's not in the AdventureWorks db, it's in my actual db. But I'm interested in this sort of thing, so I'll probobly post up here the three different methods I'm seeing and the differences. AdventureWorks might be too small to really differentiate the performance, but we'll see.

We have the Top 1 Desc (starting), Return Row_Number 1 for a very specific item (my new) and Return Row_Numbers and join on 1 to limit the results (actually returns multiple) that you gave me.