• drew.allen - Tuesday, July 24, 2018 8:48 AM

    You only tested on four rows (hence the scan count 4).  As the number of rows increases, the cost of the CROSS APPLY will increase roughly at O(n log n), whereas the cost of the ROW_NUMBER() method will increase at roughly O(n).  So your method may well "win" at small row counts, it will lose big time on row counts typical of a normal database, in the same way that a human can win a race against a horse if the distance is short enough, but will lose when the distance is longer.

    Drew

    I think my method will win when there are a lot of rows with the same Id, it doesn't matter how many distinct Ids there are. When there are less than about 4 rows per Id then the row_count method might win. My query only does one table scan to get the distinct Ids. The query it does to get the TOP(1) uses a seek (so is fast).  The only way to tell is to test it on different data.

    Here's a script where you can change the total number of rows and the number of rows per Id. (you'll need a tally table with at least 1 million rows on http://www.sqlservercentral.com/articles/T-SQL/62867/)
    set statistics io,time off

    DROP TABLE latestrecords ;
    GO
    CREATE TABLE latestrecords
    (
    Id int,
    Email varchar(200),
    Customer_FirstName varchar(200),
    Status varchar(200),
    RecordTimestamp DATETIME
    )
    GO
    DECLARE @RowsToInsert as int = 1000000
    DECLARE @RowsPerId as int = 100
    DECLARE @DistinctIds as int =@RowsToInsert /@RowsPerId
    INSERT INTO latestrecords
    SELECT t.N%@DistinctIds ,CONCAT('test', t.N%@DistinctIds,'@test.com'),CONCAT('Name',t.N%@DistinctIds),'New',DATEADD(minute,t.n,'July 15, 2018 10:08 PM')
    from tally t
    where t.N <= @RowsToInsert
    CREATE INDEX IX_latestrecords_1 ON latestrecords(Id,RecordTimestamp) INCLUDE (Email,Customer_FirstName,Status) ;

    SELECT COUNT(*) Count FROM latestrecords
    SELECT COUNT(distinct id) [distinct id] FROM latestrecords

    IF OBJECT_ID('tempdb..#t1') IS NOT NULL
      DROP TABLE #t1
    IF OBJECT_ID('tempdb..#t2') IS NOT NULL
      DROP TABLE #t2

    set statistics io,time on
    PRINT '************* row_number *************'
    select Id,Email,Customer_FirstName,Status,RecordTimestamp
    into #t1
    from
    (
    select *,
    row_number()over(partition by Id,Email,Customer_FirstName order by RecordTimestamp desc) as rnk
    from latestrecords
    )ranked_records
    where rnk=1
    PRINT '************* CTE *************'
    ;WITH CTE AS (select distinct Id from latestrecords)
    SELECT T.*
    into #t2
    FROM CTE
    CROSS APPLY(SELECT TOP(1) *
    FROM latestrecords X
    WHERE X.Id =CTE.Id
    ORDER BY X.RecordTimestamp desc) as T

    set statistics io,time off

    With the values I've put in 1 million rows total with 10,000 different Ids I get the following results on my machine:

  • ************* row_number *************

    SQL Server Execution Times:
     CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time:
     CPU time = 511 ms, elapsed time = 511 ms.
    Table 'latestrecords'. Scan count 5, logical reads 7017, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 1607 ms, elapsed time = 750 ms.

    (10000 rows affected)
    ************* CTE *************

    SQL Server Execution Times:
     CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time:
     CPU time = 0 ms, elapsed time = 1 ms.
    Table 'latestrecords'. Scan count 10005, logical reads 39809, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
     CPU time = 187 ms, elapsed time = 35 ms.

    (10000 rows affected)