November 30, 2011 at 1:01 pm
I often perform reports which require joining the most recent of a many-to-one table joined back to the main. I've been using the below pattern for this type of query, but I was wondering if there is a better approach.
Any feedback would be greatly appreciated.
SELECT a.CustomerID, a.CustomerName, ..., b.CallDate, b.CallerID, b.CallDuration, ...
FROM Customers a
LEFT OUTER JOIN
CustomerCalls b
ON b.ID =
(
SELECT TOP 1 ID
FROM CustomerCalls
WHERE CustomerID = a.CustomerID
ORDER BY CallDate DESC
)
November 30, 2011 at 1:42 pm
I would probably use a cte, something like this:
;
with MostRecentCalls as (
select CustomerID, max(ID) as callID
from CustomerCalls
group by CustomerID
)
select a.CustomerID, a.CustomerName, ..., C.CallDate, C.CallerID, C.CallDuration, ...
FROM Customers a
LEFT OUTER JOIN
MostRecentCalls b on a.CustomerID = b.CustomerID
JOIN
CustomerCalls C on b.callID = C.ID
The probability of survival is inversely proportional to the angle of arrival.
November 30, 2011 at 2:25 pm
Craig has already posted some test scripts with different solutions and comparison in another topic.
Check this out: http://www.sqlservercentral.com/Forums/FindPost1210550.aspx
November 30, 2011 at 3:21 pm
The CROSS APPLY solution is pretty interesting, I haven't messed with that type of query before.
I did some timed tests using Jeff Modon's million row test table, which I modified to have a million row main table and a 50 million row sub table with pricing information.
-- Build the test data
IF OBJECT_ID('tempdb..#TempMain') IS NULL BEGIN
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO #TempMain
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2
ALTER TABLE #TempMain
ADD CONSTRAINT PK_TempMain_RowNum PRIMARY KEY CLUSTERED (RowNum)
END
IF OBJECT_ID('tempdb..#TempSub') IS NULL BEGIN
SELECT TOP 50000000
RowNum = IDENTITY(INT,1,1),
Parent = ABS(CHECKSUM(NEWID()))%1000000+1,
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME)
INTO #TempSub
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2
ALTER TABLE #TempSub
ADD CONSTRAINT PK_TempSub_RowNum PRIMARY KEY CLUSTERED (RowNum)
CREATE INDEX IX_TempSub_Parent ON #TempSub ( Parent, SomeDate ) INCLUDE ( RowNum )
END
I tried three different approaches:
Method 1 is the pattern I've been using, doing a left join from the main table and using a subquery to get the id of the sub table.
Method 2 uses a CTE to calculate the most recent date and then uses that join to both the main and sub table.
Method 3 uses the OUTER APPLY approach (as opposed to a CROSS APPLY, to ensure we have a main record when nothing exists in the sub table.
-- Throwaway variables.
DECLARE @RowNum1 int, @SomeLetters2 char(2), @SomeCSV varchar(80), @RowNum2 int, @SomeMoney money, @SomeDate datetime
-- Method 1
SET STATISTICS TIME ON
SELECT @RowNum1 = a.RowNum, @SomeLetters2 = a.SomeLetters2, @SomeCSV = a.SomeCSV, @RowNum2 = b.RowNum, @SomeMoney = b.SomeMoney, @SomeDate = b.SomeDate
FROM #TempMain a
LEFT OUTER JOIN
#TempSub b
ON b.RowNum =
(
SELECT TOP 1 RowNum
FROM #TempSub
WHERE Parent = a.RowNum
ORDER BY SomeDate DESC
)
SET STATISTICS TIME OFF
-- Method 2
SET STATISTICS TIME ON
;WITH CTE AS (
SELECT Parent, MAX(SomeDate) as MaxDate
FROM #TempSub
GROUP BY Parent
)
SELECT @RowNum1 = a.RowNum, @SomeLetters2 = a.SomeLetters2, @SomeCSV = a.SomeCSV, @RowNum2 = c.RowNum, @SomeMoney = c.SomeMoney, @SomeDate = c.SomeDate
FROM #TempMain a
LEFT OUTER JOIN
CTE b ON a.RowNum = b.Parent
LEFT OUTER JOIN
#TempSub c ON a.RowNum = c.RowNum AND b.MaxDate = c.SomeDate
SET STATISTICS TIME OFF
-- Method 3
SET STATISTICS TIME ON
SELECT @RowNum1 = a.RowNum, @SomeLetters2 = a.SomeLetters2, @SomeCSV = a.SomeCSV, @RowNum2 = b.RowNum, @SomeMoney = b.SomeMoney, @SomeDate = b.SomeDate
FROM #TempMain a
CROSS APPLY
(
SELECT TOP 1 RowNum, SomeMoney, SomeDate
FROM #TempSub
WHERE Parent = a.RowNum
ORDER BY SomeDate DESC
) b
SET STATISTICS TIME OFF
Results:
SQL Server Execution Times:
CPU time = 16005 ms, elapsed time = 1230 ms.
SQL Server Execution Times:
CPU time = 37159 ms, elapsed time = 3370 ms.
SQL Server Execution Times:
CPU time = 16285 ms, elapsed time = 1238 ms.
Method 1 is pretty solid, though the OUTER APPLY in Method 3 is almost the same.
Are there any other approaches worth giving a try? Or modification of the three queries I've tested above?
Viewing 4 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy