drew.allen - Tuesday, July 24, 2018 8:48 AM
I think my method will win when there are a lot of rows with the same Id, it doesn't matter how many distinct Ids there are. When there are less than about 4 rows per Id then the row_count method might win. My query only does one table scan to get the distinct Ids. The query it does to get the TOP(1) uses a seek (so is fast). The only way to tell is to test it on different data.
Here's a script where you can change the total number of rows and the number of rows per Id. (you'll need a tally table with at least 1 million rows on http://www.sqlservercentral.com/articles/T-SQL/62867/)set statistics io,time off
DROP TABLE latestrecords ;
GO
CREATE TABLE latestrecords
(
Id int,
Email varchar(200),
Customer_FirstName varchar(200),
Status varchar(200),
RecordTimestamp DATETIME
)
GO
DECLARE @RowsToInsert as int = 1000000
DECLARE @RowsPerId as int = 100
DECLARE @DistinctIds as int =@RowsToInsert /@RowsPerId
INSERT INTO latestrecords
SELECT t.N%@DistinctIds ,CONCAT('test', t.N%@DistinctIds,'@test.com'),CONCAT('Name',t.N%@DistinctIds),'New',DATEADD(minute,t.n,'July 15, 2018 10:08 PM')
from tally t
where t.N <= @RowsToInsert
CREATE INDEX IX_latestrecords_1 ON latestrecords(Id,RecordTimestamp) INCLUDE (Email,Customer_FirstName,Status) ;
SELECT COUNT(*) Count FROM latestrecords
SELECT COUNT(distinct id) [distinct id] FROM latestrecords
IF OBJECT_ID('tempdb..#t1') IS NOT NULL
DROP TABLE #t1
IF OBJECT_ID('tempdb..#t2') IS NOT NULL
DROP TABLE #t2
set statistics io,time on
PRINT '************* row_number *************'
select Id,Email,Customer_FirstName,Status,RecordTimestamp
into #t1
from
(
select *,
row_number()over(partition by Id,Email,Customer_FirstName order by RecordTimestamp desc) as rnk
from latestrecords
)ranked_records
where rnk=1
PRINT '************* CTE *************'
;WITH CTE AS (select distinct Id from latestrecords)
SELECT T.*
into #t2
FROM CTE
CROSS APPLY(SELECT TOP(1) *
FROM latestrecords X
WHERE X.Id =CTE.Id
ORDER BY X.RecordTimestamp desc) as T
set statistics io,time off
With the values I've put in 1 million rows total with 10,000 different Ids I get the following results on my machine:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 511 ms, elapsed time = 511 ms.
Table 'latestrecords'. Scan count 5, logical reads 7017, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1607 ms, elapsed time = 750 ms.
(10000 rows affected)
************* CTE *************
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'latestrecords'. Scan count 10005, logical reads 39809, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 187 ms, elapsed time = 35 ms.
(10000 rows affected)