Select most recent record in JOIN

  • I'm sure this is a classic newbie question, but I haven't been able to find the answer myself.... I have a Order Status table with the following schema:

    StatusIDOrderIDStatusDate OrderStatus

    214500532011-03-10 Pending

    794500532011-05-02Shipped

    894500762011-05-09 Pending

    240500882011-03-14 Pending

    241500882011-03-14 Cancelled

    267500882011-03-15 Reopen

    392500882011-03-25 Shipped

    My requirement is that I need a report that shows all orders and the most recent OrderStatus and StatusDate

    The most efficient way I have been able to accomplish this is via a correlated subquery in the WHERE clause:

    SELECT O.OrderID, O.OrderDate, S.StatusDate, S.OrderStatus

    FROM Orders AS O JOIN OrderStatus AS S ON O.OrderID=S.OrderID

    WHERE S.StatusID = (SELECT TOP 1 StatusID FROM OrderStatus WHERE OrderID = O.OrderID ORDER BY StatusDate DESC)

    Which returns

    OrderIDOrderDateStatusDate OrderStatus

    500532011-05-012011-05-02Shipped

    500762011-05-072011-05-09 Pending

    500882011-03-132011-03-25 Shipped

    But something tells me there is a better, less messy way. If there is, can anyone point this newbie in the right direction?

  • Chi Chi Cabron (5/20/2011)


    But something tells me there is a better, less messy way. If there is, can anyone point this newbie in the right direction?

    Yes you can ask them to read the following article:

    http://www.sqlservercentral.com/articles/Best+Practices/61537

    🙂

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Obviously I am beyond help.

  • is it possible that an order can have more than one record with the same StatusDate? Based on the limited data and the fact that we don't know what the other table even looks like, I took a stab at it.

    Create table #orders (StatusID smallint primary key, OrderID int, StatusDate smalldatetime, OrderStatus varchar(20))

    CREATE NONCLUSTERED INDEX [Orders_OX] ON #orders

    (

    [OrderID] ASC,

    [StatusDate] ASC

    )

    insert into #orders values

    (214,'50053','2011-03-10','Pending'),

    (794,'50053','2011-05-02','Shipped'),

    (894,'50076','2011-05-09','Pending'),

    (240,'50088','2011-03-14','Pending'),

    (241,'50088','2011-03-14','Cancelled'),

    (267,'50088','2011-03-15','Reopen'),

    (392,'50088','2011-03-25','Shipped')

    select o.*

    from #orders o

    inner join (

    select OrderID,

    MAX(StatusDate) StatusDate

    from #orders

    group by OrderID ) v

    on o.orderId = v.OrderID

    and o.StatusDate = v.StatusDate

    When I ran your code, I got this for statistics

    (3 row(s) affected)

    Table 'Worktable'. Scan count 11, logical reads 31, physical reads 0,

    read-ahead reads 0, lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    Table '#orders__________________________________________________

    ___________________________________________________________

    00000000000F'. Scan count 4, logical reads 8, physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Here's the stats when I ran mine

    3 row(s) affected)

    Table '#orders____________________________________________________

    _________________________________________________________

    00000000000F'. Scan count 1, logical reads 16, physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Lastly, on a million rows

    Yours -

    (327682 row(s) affected)

    Table 'Orders'. Scan count 917513, logical reads 2928793, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 6 Seconds to Retrieve

    Mine -

    (327682 row(s) affected)

    Table 'Orders'. Scan count 18, logical reads 6209, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 4 Seconds

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Chi Chi Cabron (5/20/2011)


    Obviously, I am beyond help.

    Nah.

    The way you did it there is pretty much what you had to do in SQL 2000. In 2005 and beyond, you can often get this more easily/efficiently with the use of CTEs and ranking functions.

    USE ProofOfConcept ;

    GO

    IF OBJECT_ID(N'tempdb..#T') IS NOT NULL

    DROP TABLE #T ;

    CREATE TABLE #T

    (StatusID INT IDENTITY

    PRIMARY KEY,

    OrderID INT,

    StatusDate DATE,

    OrderStatus VARCHAR(25)) ;

    INSERT INTO #T

    (OrderID, StatusDate, OrderStatus)

    VALUES (1, '20110101', 'Start'),

    (1, '20110102', 'Done'),

    (2, '20110101', 'Start') ;

    ;

    WITH CTE

    AS (SELECT *,

    ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY StatusDate DESC) AS Row

    FROM #T)

    SELECT *

    FROM CTE

    WHERE Row = 1 ;

    What I've done here is assign a row number based on the OrderID and the StatusDate. It resets for each OrderID (partition by), and starts at the last date value (desc).

    Then, once I've got that, in the Common Table Expression (CTE), I just query where the row = 1, which will be the last date for each order.

    Make sense?

    For future posts, putting something together like I just did, with a temp table and some sample data, can help you get faster help, because then nobody has to figure that part out for you while trying to help out. That's all WC was asking for.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Nice, I wasn't even thinking about CTEs. I was conerned with showing stats without thinking of trying to keep it simple

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • I have no idea why I have to wrap the following in quotes to make it behave... but this is my message.

    The other option is to use CROSS APPLY (SELECT TOP 1 DESC) instead of the MAX method.

    MAX is my old friend, but depending on indexing and data heuristics CROSS APPLY can outperform MAX.

    I'm still nailing down the details on that, but it's worth checking both when you're dealing with an audit table like this. In general both will outperform the reversed row_number if your indexes are sound, otherwise row_number can outperform.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Fancy! I will look into CTEs. Thanks so much, that is going to be incredibly useful!

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply