Finding "Max" Value from subquery

  • Hi all,

    Kinda stuck on finding a way to optimise this solution.

    Let's say I have a table which contains a bunch of dates, as well as another table which has a bunch of dates and some values:

    CREATE TABLE #Records

    (

    [Date] DATETIME,

    [JoinValue] INT

    )

    INSERT INTO #Records ([Date], [JoinValue])

    VALUES('2011-10-01', 1)

    INSERT INTO #Records ([Date], [JoinValue])

    VALUES('2011-10-02', 1)

    INSERT INTO #Records ([Date], [JoinValue])

    VALUES('2011-10-03', 1)

    INSERT INTO #Records ([Date], [JoinValue])

    VALUES('2011-10-04', 1)

    INSERT INTO #Records ([Date], [JoinValue])

    VALUES('2011-10-05', 1)

    CREATE TABLE #MoreRecords

    (

    [Date] DATETIME,

    [JoinValue] INT,

    [Value] INT

    )

    INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])

    VALUES ('2011-10-01', 1, NULL)

    INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])

    VALUES ('2011-10-02', 1, NULL)

    INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])

    VALUES ('2011-10-03', 1, 1)

    INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])

    VALUES ('2011-10-04', 1, 2)

    INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])

    VALUES ('2011-10-05', 1, 3)

    What I want to do is, for each date in #Records, find the most recent value from #MoreRecords which is not null, but is not more recent than the date in #Records.

    Bit of a convoluted question I know, so I'll post what I have so far:

    SELECT *,

    (

    SELECT TOP 1 [Value]

    FROM #MoreRecords

    WHERE#MoreRecords.JoinValue = #Records.JoinValue

    AND #MoreRecords.Date <= #Records.Date

    AND #MoreRecords.[Value] IS NOT NULL

    ORDER BY #MoreRecords.Date DESC

    ) AS [Value]

    FROM #Records

    DateJoinValueValue

    2011-10-01 00:00:00.0001NULL

    2011-10-02 00:00:00.0001NULL

    2011-10-03 00:00:00.00011

    2011-10-04 00:00:00.00012

    2011-10-05 00:00:00.00013

    So as you can see from the data, until the 3rd of october, there are no non-null values in #MoreRecords. After the 3rd, they begin to show up.

    The problem with this query is just that it is *really* slow. I have a table of about 500000 records. Now, if I have to, I'll just brute force it - leave it running overnight and hopefully itt'l be done when I get in the next day. But I'd like a cleaner solution - at worst case itt'l give me a solution if I run in to the problem again.

    Anyone have any suggestions?

  • Another possible solution:

    WITH cte AS

    (

    SELECT

    #Records.Date,

    #MoreRecords.[Value],

    ROW_NUMBER() OVER (PARTITION BY #Records.JoinValue, #Records.Date ORDER BY #MoreRecords.[Value] DESC) AS rowNum

    FROM #Records

    LEFT JOIN #MoreRecords ON#MoreRecords.JoinValue = #Records.JoinValue

    AND #MoreRecords.Date <= #Records.Date

    AND #MoreRecords.[Value] IS NOT NULL

    )

    SELECT [Date], [Value] FROM cte WHERE rowNum = 1

    Haven't been able to test it though, on real data, since there's other queries running ... gonna try tomorrow and see what I can get.

  • For performance issues, it really helps to have the actual and/or estimated execution plans.

    One thing you can try on your own is turning on XML statistics before the rest of your query

    SET STATISTICS XML ON

    and reviewing that for "missingindexes".

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I'd already checked that part, but I can post the execution logs for you to see if you can find an improvement. The first version took 45 minutes to run. Going to run the second one now and see how long it takes.

  • In the end I ended up solving my problem by just investigating it a bit more and discovering it wasn't quite as extreme as the question that I'm asking in this thread. However, I'm still curious if there is a better solution than either of the two I've proposed here, 'cause neither of those is very elegant.

  • I did look at using an APPLY rather than a subquery, but the execution plans for both were essentially the same. The only difference I found was that the subquery had a "Compute Scalar" that was missing from the APPLY.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • How about using RANK() OVER?

    See how this performs:

    WITH FullResults (RecordsDate, RecordsJoinValue, MoreRecordsDate, MoreRecordsJoinValue,MoreRecordsValue, MoreRecordsRank)

    AS

    (

    SELECT #Records.[Date] AS RecordsDate

    , #Records.JoinValue AS RecordsJoinValue

    , #MoreRecords.[Date] AS MoreRecordsDate

    , #MoreRecords.JoinValue AS MoreRecordsJoinValue

    , #MoreRecords.Value AS MoreRecordsValue

    , RANK() OVER (PARTITION BY #Records.[Date], #Records.JoinValue ORDER BY #MoreRecords.[Date] DESC) AS MoreRecordsRank

    FROM #Records

    INNER JOIN #MoreRecords ON #MoreRecords.JoinValue = #Records.JoinValue

    AND #MoreRecords.[Date] <= #Records.[Date]

    AND #MoreRecords.[Value] IS NOT NULL

    )

    SELECT * FROM FullResults WHERE MoreRecordsRank = 1

  • drew.allen (10/5/2011)


    I did look at using an APPLY rather than a subquery, but the execution plans for both were essentially the same. The only difference I found was that the subquery had a "Compute Scalar" that was missing from the APPLY.

    Drew

    Post your code? You may find this solution works better. Always worth investigating.

  • Lynn Pettis (10/13/2011)


    Post your code? You may find this solution works better. Always worth investigating.

    I didn't save the code.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • select *

    from #Records A

    left outer join

    (select join_value, max(value)

    from #Records M

    left outer join #MoreRecords N on M.join_value = N.join_value

    where M.date >= N.date

    group by join_value

    ) R on A.join_value = R.join_value

    Hope this helps

  • I had posted the incomplete query by mistake.

    sorry for the confusion, here is the query that I wanted to post.

    select A.*, B.Value

    from #Records A

    left outer join (select M.joinvalue, M.date, Max(N.date) as Recent_date

    from #Records M

    left outer join #MoreRecords N on M.joinvalue = N.joinvalue

    where M.date >= N.date

    and N.value is not null

    group by M.joinvalue

    ,M.date

    ) R on A.joinvalue = R.joinvalue and A.date = R.date

    left outer join #MoreRecords B on R.joinvalue = B.joinvalue and R.Recent_date = B.date

    Hope this helps

  • Next time a mistake like that happens, it's OK to just edit the post. 🙂

    Also, it's much easier to read a query when it's posted inside a sql code block. Like this, without the space between the first bracket and the word "code"

    [ code="sql"]SELECT 'Hello World' [/code].

  • I am new to this forum. I didn't know the post can be edited. thanks for the tip.

    There was indentation in the query before I posted. after it was posted, all the indentation is gone. the block, would that preserve the indentation of the original query after posting.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply