Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Finding the most recent value for many record in a detail table (optimizing) Expand / Collapse
Author
Message
Posted Thursday, May 23, 2013 10:00 AM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, June 6, 2014 8:23 PM
Points: 73, Visits: 761
My system has a number of history tables that would give the date that a value has changed.

I found a very close analogue to this in Adventure works looking up products and seeing what the most recent value was as of a certain date.

I can think of two ways to do this:

declare @ReportingDate DATE = '9/7/2004'

-- Method 1
SELECT
Name
,ReportDate = @ReportingDate
,MostRecentOrderDetailId = (SELECT TOP 1 SalesOrderDetailId FROM sales.SalesOrderDetail SalesOrderDetail WHERE SalesOrderDetail.ProductID = Product.ProductID AND SalesOrderDetail.ModifiedDate <= @ReportingDate ORDER BY ModifiedDate DESC)
FROM Production.Product Product

-- Method 2
SELECT
Name
,ReportDate = @ReportingDate
,MostRecent.MostRecentOrderDetailId
FROM Production.Product Product
OUTER APPLY (SELECT TOP 1 MostRecentOrderDetailId = SalesOrderDetailId FROM sales.SalesOrderDetail SalesOrderDetail WHERE SalesOrderDetail.ProductID = Product.ProductID AND SalesOrderDetail.ModifiedDate <= @ReportingDate ORDER BY ModifiedDate DESC) MostRecent

Both produce a near identical query plan. Method 2 would be a clear winner of the two if I wanted to retrieve more than a single value from the detail table (say price and quantity).

Is there a better way to do this overall without refactoring how the data is stored?

Best performance I can think of would be to have a daily reporting table that precalculates the requested values cross applied with a dates table to give the value for every date. Then it would be a simple join rather than a subquery/cross apply.

Even using a reporting table (DailySalesOrderDetail), these queries need to be optimized because they'll be needed to recalculate the the reporting whenever the SalesOrderDetail table changes.

IF OBJECT_ID('tempdb..#DailyOrderDetail') IS NOT NULL DROP TABLE #DailyOrderDetail

SELECT DateFull, Name, MostRecentOrderDetailId
INTO #DailyOrderDetail
FROM edimain.dbo.DateLookup
CROSS APPLY (SELECT
Name
,MostRecentOrderDetailId = (SELECT TOP 1 SalesOrderDetailId FROM sales.SalesOrderDetail SalesOrderDetail WHERE SalesOrderDetail.ProductID = Product.ProductID AND SalesOrderDetail.ModifiedDate <= DateFull ORDER BY ModifiedDate DESC)
FROM Production.Product Product) MostRecent
WHERE DateFull BETWEEN '1/1/2002' and '1/1/2005'

CREATE NONCLUSTERED INDEX IX_DailyOrderDetail_DateFull
ON #DailyOrderDetail ([DateFull])
INCLUDE ([Name],[MostRecentOrderDetailId])

declare @ReportingDate DATE = '9/7/2004'

-- Method 3
SELECT
d.Name
,ReportDate = @ReportingDate
,MostRecentOrderDetailId = d.MostRecentOrderDetailId
FROM Production.Product Product
INNER JOIN #DailyOrderDetail D on D.DateFull = @ReportingDate AND D.Name = Product.Name

Given the reporting table already existing, Method 3 (just a join to a denormalized reporting table) doesn't even register as a single percentage point in comparison to Method 1 and Method 2.

So if I could get Method 1 or Method 2 optimized further, that would be an interesting exercise. My application uses it quite a bit an it would be useful for optimizing a reporting table to use Method 3 as well.

Is there a term for what sort of query this is?


  Post Attachments 
QP.jpg (5 views, 242.08 KB)
Post #1456073
Posted Thursday, May 23, 2013 10:12 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 8:36 AM
Points: 4,319, Visits: 6,112
tl;dr

1) I think pre-aggregated data will always smoke real-time queries

2) since you are on SQL 2012, have a look at LAST_VALUE. It might be a better option than your TOP 1 subquery. NOTE: the default window clause (RANGE BETEEN...) is HORRIBLY inefficient currently. Be sure to use ROWS BETWEEN...



Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1456078
Posted Thursday, May 23, 2013 1:28 PM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, June 6, 2014 8:23 PM
Points: 73, Visits: 761
The First_Value/Last_Value sounds really promising, but I'm not quite getting it yet.

-- Method 1.5a
SELECT DISTINCT
Product.Name
,ReportDate = @ReportingDate
,MostRecentOrderDetailId = (SELECT LAST_VALUE(SalesOrderDetailId) OVER (PARTITION BY Product.ProductId order by SalesOrderDetail.ModifiedDate)
FROM sales.SalesOrderDetail SalesOrderDetail WHERE SalesOrderDetail.ProductID = product.ProductID
AND SalesOrderDetail.ModifiedDate <= @ReportingDate)
FROM Production.Product Product


-- Method 1.5b
SELECT DISTINCT
Product.Name
,ReportDate = @ReportingDate
,LAST_VALUE(SalesOrderDetailId) OVER (PARTITION BY Product.ProductId order by SalesOrderDetail.ModifiedDate)
FROM Production.Product Product
LEFT JOIN sales.SalesOrderDetail SalesOrderDetail ON SalesOrderDetail.ProductID = product.ProductID

1.5a returns the fun error:
[color=RRGGBB]Msg 512, Level 16, State 1, Line 11
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.[/color]

When I try to add grouping to either, the window function I get this error:
[color=#RRGGBB]Column 'sales.SalesOrderDetail.ModifiedDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.[/color]

For every product I'm trying to get a single value (the Last_Value before a date), but I seem to be doing rather poorly at using Last_Value and getting it to group or return just a single value.

Any advice on what I'm overlooking here?
Post #1456198
Posted Friday, May 24, 2013 1:37 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 10:42 AM
Points: 11,192, Visits: 11,089
This is a variation of the 'Top N per Group' problem, and there are many ways to express it in T-SQL. Whichever you choose, be sure to make the ORDER BY clause deterministic and provide a useful index. A deterministic ORDER BY simply means there should be no ties. In the AdventureWorks example, TOP (1) ... ORDER BY ModifiedDate DESC is not deterministic per product because multiple order detail rows can have the same ModifiedDate. A useful index for that query is:

CREATE INDEX nc1 
ON Sales.SalesOrderDetail
(
ProductID,
ModifiedDate DESC,
SalesOrderDetailID DESC
);

Writing the query to be deterministic:

SELECT 
p.Name,
ReportDate = @ReportingDate,
MostRecent.SalesOrderDetailID
FROM Production.Product AS p
OUTER APPLY
(
SELECT TOP (1)
sod.SalesOrderDetailID
FROM Sales.SalesOrderDetail AS sod
WHERE
sod.ProductID = p.ProductID
AND sod.ModifiedDate <= @ReportingDate
ORDER BY
sod.ModifiedDate DESC,
sod.SalesOrderDetailID DESC
) AS MostRecent;

The query plan no longer features an index spool or sort, seeking straight to the required row per product:





Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi


  Post Attachments 
ssc.png (121 views, 22.51 KB)
Post #1456316
Posted Friday, May 24, 2013 2:34 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Saturday, July 19, 2014 12:54 AM
Points: 460, Visits: 416
How about this solutions...

--- Create a InLine Function with ReportingDate Parameter --
CREATE FUNCTION fnMostRecentOrderDetail(@ReportingDate datetime) RETURNS TABLE
AS
RETURN
(
SELECT ProductId, SalesOrderDetailId, ModifiedDate,
ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY ModifiedDate DESC) RowNo
FROM sales.SalesOrderDetail
WHERE ModifiedDate < @ReportingDate
)
GO

and then...

-- Join the function with ReportingDate Parameter --
declare @ReportingDate DATE = '9/7/2007'
SELECT
Name
,ReportDate = @ReportingDate
,MostRecent.ModifiedDate
FROM Production.Product Product
LEFT OUTER JOIN fnMostRecentOrderDetail(@ReportingDate) MostRecent ON Product.ProductID = MostRecent.ProductId and RowNo = 1

Post #1456326
Posted Friday, May 24, 2013 5:14 AM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Monday, July 14, 2014 6:55 AM
Points: 717, Visits: 539
Add index suggested in Paul's reply, to improve performance of your query.
Post #1456395
Posted Friday, May 24, 2013 8:57 AM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, June 6, 2014 8:23 PM
Points: 73, Visits: 761
Paul White (5/24/2013)
This is a variation of the 'Top N per Group' problem, and there are many ways to express it in T-SQL. Whichever you choose, be sure to make the ORDER BY clause deterministic and provide a useful index...


Thank you for the correction! yes, on the query in my system that looks into history is deterministic, my AdventureWorks example wasn't as well thought out as it could be :)

More importantly giving a good example of what the right index should get you in a query plan
Post #1456555
Posted Friday, May 24, 2013 9:12 AM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, June 6, 2014 8:23 PM
Points: 73, Visits: 761
Bhaskar.Shetty (5/24/2013)
How about this solutions...

>>SQL Removed<<


Hmm, thank you. I hadn't quite thought of that as a third option to the Sub query and cross apply. In fact if I'd made a ITF, I think my mind would have drifted towards a cross apply

I like how it's clean and I'm interested to see how it the RowNumber compares to the Top 1 performance wise. I'll post a followup comparison :)
Post #1456571
Posted Thursday, May 30, 2013 1:26 PM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, June 6, 2014 8:23 PM
Points: 73, Visits: 761
Bhaskar.Shetty (5/24/2013)
How about this solutions...

--- Create a InLine Function with ReportingDate Parameter --
CREATE FUNCTION fnMostRecentOrderDetail(@ReportingDate datetime) RETURNS TABLE
AS
RETURN
(
SELECT ProductId, SalesOrderDetailId, ModifiedDate,
ROW_NUMBER() OVER (PARTITION BY ProductId ORDER BY ModifiedDate DESC) RowNo
FROM sales.SalesOrderDetail
WHERE ModifiedDate < @ReportingDate
)
GO

and then...

-- Join the function with ReportingDate Parameter --
declare @ReportingDate DATE = '9/7/2007'
SELECT
Name
,ReportDate = @ReportingDate
,MostRecent.ModifiedDate
FROM Production.Product Product
LEFT OUTER JOIN fnMostRecentOrderDetail(@ReportingDate) MostRecent ON Product.ProductID = MostRecent.ProductId and RowNo = 1



I have a solution very similar now.

Currently it's a inline table function that takes a product id and date, then returns the one reference back to the record.

It uses the Row_Number() ORDER BY... where RowNumber = 1, but inside of the function, so it always just returns a single row.

Currently it's not in the AdventureWorks db, it's in my actual db. But I'm interested in this sort of thing, so I'll probobly post up here the three different methods I'm seeing and the differences. AdventureWorks might be too small to really differentiate the performance, but we'll see.

We have the Top 1 Desc (starting), Return Row_Number 1 for a very specific item (my new) and Return Row_Numbers and join on 1 to limit the results (actually returns multiple) that you gave me.
Post #1458399
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse