Cross Apply Timing Out on Large (1bn rows) Table

  • I have a small table Members (1000 rows) that contains member information.
    I have a large view History (1bn rows) that contain member transaction history.

    I need to find the latest transaction in History for each Member. I tried using Cross Apply, but am getting a time out error when running. I only have limited read rights on the server, so I can not see the view definition, nor can I run an execution plan.

    My thought process below was to
    1.) Choose a DATEOFSERVICE within the last 30 days to limit rows to work with
    2.) Group from most common attribute to least

    However, this still did not work and could not return results within the 6 minute timeout. Any ideas for a better way to approach this issue?


    SELECT * FROM MEMBERS M
    CROSS APPLY (SELECT TOP 1 DATEOFSERVICE,BATCHID, MEMBERID FROM vwHISTORY
    GROUP BY MEMBERID,BATCHID, DATEOFSERVICE HAVING MEMBERID = M.MEMBERID AND DATEOFSERVICE >= '20170601'
    ORDER BY DATEOFSERVICE DESC) A

  • SirWittles - Monday, July 3, 2017 10:18 PM

    I have a small table Members (1000 rows) that contains member information.
    I have a large view History (1bn rows) that contain member transaction history.

    I need to find the latest transaction in History for each Member. I tried using Cross Apply, but am getting a time out error when running. I only have limited read rights on the server, so I can not see the view definition, nor can I run an execution plan.

    My thought process below was to
    1.) Choose a DATEOFSERVICE within the last 30 days to limit rows to work with
    2.) Group from most common attribute to least

    However, this still did not work and could not return results within the 6 minute timeout. Any ideas for a better way to approach this issue?


    SELECT * FROM MEMBERS M
    CROSS APPLY (SELECT TOP 1 DATEOFSERVICE,BATCHID, MEMBERID FROM vwHISTORY
    GROUP BY MEMBERID,BATCHID, DATEOFSERVICE HAVING MEMBERID = M.MEMBERID AND DATEOFSERVICE >= '20170601'
    ORDER BY DATEOFSERVICE DESC) A

    I think you're overcomplicating it. If all you want is the most recent row (by DateOfService) from the history view, then something like this should do it:
    SELECT m.*, a.* 
    FROM MEMBERS m
    CROSS APPLY (
     SELECT TOP(1)
      h.DATEOFSERVICE, h.BATCHID, h.MEMBERID
     FROM vwHISTORY h
     WHERE h.MEMBERID = M.MEMBERID
     ORDER BY h.DATEOFSERVICE DESC
    ) a

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Without having access to the views and underlying tables/index its impossible to give a good answer.
    not knowing the data also doesn't help.

    You will need to get your DBA's involved in this in order to determine the best way to do it.

    But you could try the following.


    -- Assumption - batchid is incremented sequentially

    if object_id('tempdb..#t1') is not null
     drop table #t1;

    select vh.MEMBERID
      , MAX(vh.DATEOFSERVICE) as DATEOFSERVICE
      , MAX(vh.BATCHID) as BATCHID
    into #t1
    from vwHISTORY vh
    where vh.DATEOFSERVICE >= '20170601' -- try without this to see if performance is different
    group by vh.MEMBERID

    select m.*
      , t.DATEOFSERVICE
      , t.BATCHID
    from MEMBERS m
    inner join #t1 t
    on t.MEMBERID = m.MEMBERID

    -- Assumption - batchid is NOT incremented sequentially
    -- requires 2 steps
    -- first one to get the highest date of service
    -- second to get the highest batchid for the found date of service

    if object_id('tempdb..#t1') is not null
     drop table #t1;

    select vh.MEMBERID
      , MAX(vh.DATEOFSERVICE) as DATEOFSERVICE
      , MAX(vh.BATCHID) as BATCHID
    into #t1
    from vwHISTORY vh
    where vh.DATEOFSERVICE >= '20170601' -- try without this to see if performance is different
    group by vh.MEMBERID

    if object_id('tempdb..#t2') is not null
     drop table #t2;

    select vh.MEMBERID
      , vh.DATEOFSERVICE
      , max(vh.BATCHID) as BATCHID
    into #t2
    from vwHISTORY vh
    inner join #t1 t
    on t.MEMBERID = vh.MEMBERID
    and t.DATEOFSERVICE = vh.DATEOFSERVICE
    group by vh.MEMBERID
       , vh.DATEOFSERVICE

    select m.*
      , t.DATEOFSERVICE
      , t.BATCHID
    from MEMBERS m
    inner join #t2 t
    on t.MEMBERID = m.MEMBERID

  • SirWittles - Monday, July 3, 2017 10:18 PM

    I have a small table Members (1000 rows) that contains member information.
    I have a large view History (1bn rows) that contain member transaction history.

    I need to find the latest transaction in History for each Member. I tried using Cross Apply, but am getting a time out error when running. I only have limited read rights on the server, so I can not see the view definition, nor can I run an execution plan.

    My thought process below was to
    1.) Choose a DATEOFSERVICE within the last 30 days to limit rows to work with
    2.) Group from most common attribute to least

    However, this still did not work and could not return results within the 6 minute timeout. Any ideas for a better way to approach this issue?


    SELECT * FROM MEMBERS M
    CROSS APPLY (SELECT TOP 1 DATEOFSERVICE,BATCHID, MEMBERID FROM vwHISTORY
    GROUP BY MEMBERID,BATCHID, DATEOFSERVICE HAVING MEMBERID = M.MEMBERID AND DATEOFSERVICE >= '20170601'
    ORDER BY DATEOFSERVICE DESC) A

    1) Based on naming convention, you almost certainly have a view in play. That could be G-d knows what.

    2) It is ridiculous to think that you can tune a query without being able to see the query plan, even the estimated one.

    3) You don't need a GROUP BY to find the most recent something. TOP 1 ... ORDER BY date DESC is all.

    4) If you can see the definition of the objects, then we need the INDEXING on the history table to be able to help. If you can't see that, then this also falls under the "ridiculous" category mentioned in item 2.

    5) Can you rewrite the query to just do item 3 for ONE of your members? Do this with SET STATISTICS IO ON. and if at all possible just directly access the history table - take the VIEW out of the equation. It is almost certainly not required for your stated need.

    6) Barring other information, I am going to say that you are doing a TABLE SCAN on the history table for every member due to suboptimal indexing. Hopefully doing just ONE member will allow the query to complete and you can see the bajillion reads on the history table caused by the table scan. 

    7) WITH proper indexing the simplified version of this query should take no more than a few hundred ms for all 1000 members.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru - Tuesday, July 4, 2017 8:56 AM

    SirWittles - Monday, July 3, 2017 10:18 PM

    I have a small table Members (1000 rows) that contains member information.
    I have a large view History (1bn rows) that contain member transaction history.

    I need to find the latest transaction in History for each Member. I tried using Cross Apply, but am getting a time out error when running. I only have limited read rights on the server, so I can not see the view definition, nor can I run an execution plan.

    My thought process below was to
    1.) Choose a DATEOFSERVICE within the last 30 days to limit rows to work with
    2.) Group from most common attribute to least

    However, this still did not work and could not return results within the 6 minute timeout. Any ideas for a better way to approach this issue?


    SELECT * FROM MEMBERS M
    CROSS APPLY (SELECT TOP 1 DATEOFSERVICE,BATCHID, MEMBERID FROM vwHISTORY
    GROUP BY MEMBERID,BATCHID, DATEOFSERVICE HAVING MEMBERID = M.MEMBERID AND DATEOFSERVICE >= '20170601'
    ORDER BY DATEOFSERVICE DESC) A

    1) Based on naming convention, you almost certainly have a view in play. That could be G-d knows what.

    2) It is ridiculous to think that you can tune a query without being able to see the query plan, even the estimated one.

    3) You don't need a GROUP BY to find the most recent something. TOP 1 ... ORDER BY date DESC is all.

    4) If you can see the definition of the objects, then we need the INDEXING on the history table to be able to help. If you can't see that, then this also falls under the "ridiculous" category mentioned in item 2.

    5) Can you rewrite the query to just do item 3 for ONE of your members? Do this with SET STATISTICS IO ON. and if at all possible just directly access the history table - take the VIEW out of the equation. It is almost certainly not required for your stated need.

    6) Barring other information, I am going to say that you are doing a TABLE SCAN on the history table for every member due to suboptimal indexing. Hopefully doing just ONE member will allow the query to complete and you can see the bajillion reads on the history table caused by the table scan. 

    7) WITH proper indexing the simplified version of this query should take no more than a few hundred ms for all 1000 members.

    Almost certainly a view in play indeed πŸ™‚

    Cheers!

  • Almost certainly a view in play indeed Smile

    Cheers!

    I travel a quite a lot for client visits and especially to present at conferences. So I tell you what Jacob:  if those three fields can't be easily acquired from a single table or at worst a simple 2-table join, I will by you a dinner should we ever be in the same town at the same time. 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru - Tuesday, July 4, 2017 10:02 AM

    Almost certainly a view in play indeed Smile

    Cheers!

    I travel a quite a lot for client visits and especially to present at conferences. So I tell you what Jacob:  if those three fields can't be easily acquired from a single table or at worst a simple 2-table join, I will by you a dinner should we ever be in the same town at the same time. 😎

    I don't find that particularly unlikely, so I'm hesitant to agree to this arrangement (unless I don't have to put up any stakes). 

    I was just pointing out that the OP had already explicitly said History was a view, so no fancy inferences were necessary πŸ™‚

    On second thought, though, I have yet to meet any fellow Threadizens, so it might be worth putting up some stake just for that πŸ™‚

    Cheers!

  • Jacob Wilkins - Tuesday, July 4, 2017 10:14 AM

    TheSQLGuru - Tuesday, July 4, 2017 10:02 AM

    Almost certainly a view in play indeed Smile

    Cheers!

    I travel a quite a lot for client visits and especially to present at conferences. So I tell you what Jacob:  if those three fields can't be easily acquired from a single table or at worst a simple 2-table join, I will by you a dinner should we ever be in the same town at the same time. 😎

    I don't find that particularly unlikely, so I'm hesitant to agree to this arrangement (unless I don't have to put up any stakes). 

    I was just pointing out that the OP had already explicitly said History was a view, so no fancy inferences were necessary πŸ™‚

    On second thought, though, I have yet to meet any fellow Threadizens, so it might be worth putting up some stake just for that πŸ™‚

    Cheers!

    It is kind of neat actually to meet someone you have come across on forums. It is actually one of my favorite things about conferences - seeing who I meet FtF while there.

    You don't need to put up any stakes for this one. But if you would like to, how about you agree to ask your company to let me spend one hour doing any form of SQL Server performance review they feel they need ... at no cost. :w00t: No risk for you there (well, unless I drop a database or something - then they might be a bit peeved at you). πŸ˜€

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thank you for all the input guys. I tried a little of everything to get this tuned, and agree that it is pointless to try to optimize if I can't see basic view and query plan information.
    I believe another part of the performance issue is that the member identifier that I was joining and filtering on can contain leading zeros, so it is a varchar field instead of an int.
    In any event, I am happy to report I located a much smaller table that was able to give me the same info needed, and the query now processes in seconds as expected.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply