Pulling Data from EAV table using recursion

  • Hi All,

    I was just wondering if this made sense actually, I have a EAV table with 500mil+ rows that I have been asked to optimize a reporting query that pulls data from the table on a date range. The indexes don't cover all the ways i need to constrain the data, and so the orginal query takes longer than 5 minutes if you pull over ten days worth of data. This times out the reporting server, and they need to do 3 month pulls roughly. I can't change the indexes, It's a third party application (of course).

    I used to be an oracle PL/SQL dev, and I'm sort of trying to replicate "bulk collect" "for all" i guess. If I pull one day of data, it runs fast, because it is only grabbing a little bit and processing it. So what if i make a CTE and recursively go by day through the date range and insert all the data into a temp table. Force it to only pull and crunch a little data so it doesn't grab a few million rows, spill into the tempdb and spend forever trying to chew it. I imagine this like putting the food in your mouth by the spoonful rather than all at once.

    Any thoughts on this? I know that there are probably better ways to fix this issue in terms of never using EAV if you have reporting needs and DBA activities that could help, but I am but a lowly reporting developer >:). Thanks All!

  • Well, if you think that spilling 3 months of data into tempdb to begin with is better, try it...   I'm not sure that's the case.   A lack of indexes is going to slow you down no matter what you do.  You involve a loop that grabs a small number of rows and repeats that action 90 times (say once per day's worth of data), it takes 90 times as much time as a run of just one day's worth.   It wouldn't be hard to set up.   However, the inserts into tempdb might get longer and longer as you go...   No way to know until you try...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Assuming that you can identify the single day - and that if any row is updated/changed it will be updated so you can identify it has changed - then a one time process to extract a set amount of data (180 days - 3 years - whatever will be needed) into a reporting table.

    Once that has been done - you then setup a daily process that pulls a single day and updates the reporting table, updating existing rows and adding new rows.  This can be done using either MERGE or an UPDATE/INSERT process.  Build in a purge process to keep a set amount of data available in the reporting table (again - 180 days, 3 years, ...).

    Now - build your report(s) to use the reporting table that has appropriate indexes to support the reports.

    Note: for this reporting table - you only include internal values where you can get the external values from a simple join to the relevant table.  In other words - don't store the lastname, firstname, dateofbirth, etc... - just store the person/customer/user/member ID.  Limit this reporting table to just the columns from the EAV table that are needed for your report(s) - you can always add more later and rebuild if needed.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • -sgmunson : "However, the inserts into tempdb might get longer and longer as you go..." Thanks for the advice, this statement took me down a rabbit hole of temdb articles, i've been noticing queries where inserting into temp tables is the main "cost" of the execution plan, a lot of stuff to learn in that realm. I bet your right that the inserts would keep slowing down and down, causing an issue. I'll post my results after I get some time to rewrite the query, it's sort of a under the table thing right now. 

    -Jeffrey Williams : If I had the time or the backing from management to fix this and do it right, I would take that path of making a reporting table. It seems like the cleanest way, and you can then avoid having to write queries against a EAV table, except for one time to pivot it. in terms of restricting by date, yeah my company has data from 2011 sitting in their system, no archive or deletion policy. so that would be necessary.  

    Thanks for the advice, I'll post once i get a chance to rework, maybe have more specific questions about how the optimizer does the recursion.

  • We have no idea what you're table looks like, what indexes are available, nor any idea what the report is supposed to do/look like.  A bit more of a description in the form of a CREATE TABLE statement with constraints and indexes would be a good start.  Add to that an "Alice's Restaurant" style picture of what the output should look like would also be a big help.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • dksmith - Thursday, October 4, 2018 10:43 AM

    Hi All,

    I was just wondering if this made sense actually, I have a EAV table with 500mil+ rows that I have been asked to optimize a reporting query that pulls data from the table on a date range. The indexes don't cover all the ways i need to constrain the data, and so the orginal query takes longer than 5 minutes if you pull over ten days worth of data. This times out the reporting server, and they need to do 3 month pulls roughly. I can't change the indexes, It's a third party application (of course).

    I used to be an oracle PL/SQL dev, and I'm sort of trying to replicate "bulk collect" "for all" i guess. If I pull one day of data, it runs fast, because it is only grabbing a little bit and processing it. So what if i make a CTE and recursively go by day through the date range and insert all the data into a temp table. Force it to only pull and crunch a little data so it doesn't grab a few million rows, spill into the tempdb and spend forever trying to chew it. I imagine this like putting the food in your mouth by the spoonful rather than all at once.

    Any thoughts on this? I know that there are probably better ways to fix this issue in terms of never using EAV if you have reporting needs and DBA activities that could help, but I am but a lowly reporting developer >:). Thanks All!

    Just another thought, but recursion is probably not necessary.  In fact, that could easily lead to problems of it's own.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply