• imani_technology (4/27/2016)


    Actually, new data will probably be constantly flowing into Hadoop. The 10-minute latency is just something the business set up as a requirement.

    Okay, so assuming your facts don't change over time (they shouldn't, they're not technically "facts" otherwise but that's for another time...) I would probably set something up that does the following:

    1. Create a partition scheme in SSAS that would reap the best query return (for example, if the users mostly focus on one month in the latest year, do it monthly for the current year and open it up to bigger partitions the further you go back).

    2. Create a partition for the current day - make sure none of the other partitions overlap this.

    3. With each new data load, do a process incremental/process index on the current day partition.

    4. At the end of the day, programatically merge this partition with the latest larger partition and create a new one for the next day's data.

    You'll have to write some funky dynamc XMLA to achieve this (although I have seen a similar thing done in a script task in SSIS before so if you're a .netter then perhaps that'll be easier for you).

    This is all assuming that the business that you work for need to see new data presented to them in a presentation layer every 10 minutes, 24 hours a day, 7 days a week.


    I'm on LinkedIn