Using OLTP as datasource over creating an OLAP

  • Hi all

    Im moving to a new company to be there BI specialist. doing some research on best practices and looking for some advice feedback on decisions to take.

    The company has no BI just now and i belive they have a few OLTP database that there web sales system updates. Im havent seen the DB's yet but have been told they would be hard to report against. Im sure for the most part Day old data would be sufficient for reporting purposes.

    My question is do i HAVE to create and OLAP database with dimensions and facts table i.e. star schema design even if the delivery method will only be ReportingServices. Im not sure if they will require the forecasting ability that SSAS can provide in the outset.

    Im sure i will have to ETL the data from the production database nightly, but what are my options for this output if im only running ssrs reports off it?

    thanks

  • If the only requirement is to run SSRS reports, the operational databases shouldn't be used and the data can be one day old, I would report off a restored backup of the OLTP servers. This seems to me the fastest way to get you going. I personally find SSRS difficult to use against MDX sources, but this isn't a problem with OLTP and standard T-SQL.

    If a true BI solution were ever desired, however, this method hasn't taken any steps in that direction.

  • RonKyle (12/6/2013)


    I personally find SSRS to use against MDX sources, but this isn't a problem with OLTP and standard T-SQL.

    Hi thanks for the replay, is the word "difficult"missing from the above sentence, as i would agree if it is.

    yes thats what i was thinking IN THE SHORT term, run reports off a restore of the current DB. I see this as giving some value to the bisness up front and would give me more time to get to grips of the business needs to provide medium & long term BI solutions weather that be Datawarehouse, Datamarts OLAP cubes.

  • I did mean to add the word difficult. Odd that I dropped an entire word. Letters I understand... I have altered the previous post.

  • You were hired as a BI specialist and sooner or later you will be asked for BI solutions. Therefore, I would start working on multidimensional or tabular models ASAP. At the same time you could provide reports from a copy of OLTP but that will not work in the long time. I know people will pressure you for results but well designed data warehouse will last for many years.

  • You can report off OLTP systems with a light touch as long as you're not looking for significant aggregations and the like. My guess is they're not, not yet.

    What I'd recommend is the following: Work off their systems. Get a feel for the company and what they're hoping to get out of their data. Build them some reports, get them interested in the idea, let them start round tabling where they'd like to go with it. Then rein in the ones running for the horizon.

    With that initial set of guidelines, you can get a much better idea of how current your data needs to be (usually there's two sets, immediate dashboards and historical reviews). From there, start your initial warehouse design, realizing it'll probably be iterative (and disposable) the first time or two while you actually settle out what the company really needs from a data perspective.

    The heavier the aggregations and the more load you put against the OLTP systems, the faster you'll be advancing to the warehouse.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks guys, very helpful info.

    one thing im not clear on. If have seen data warehouses being dropped and recreated then populated on a nightly basis. this obviously means the ETL has to be performed on all the data in the OLTP source to populate the warehouse. is this the norm? how would this scale up?

    or would you be better to adopt a strategy of just updating the warehouse with only new data?

  • I don't ever recommend dropping and recreating the warehouse from scratch, although I've seen that as a solution from some consultants. It's faster to develop but ultimately (and as you correctly ask about) doesn't scale well. Incremental updates are the way to go. It will take longer to develop, but your nightly run times will be stable instead of incrementally increasing until they reach a point where it's unsustainable.

  • RonKyle (1/7/2014)


    I don't ever recommend dropping and recreating the warehouse from scratch, although I've seen that as a solution from some consultants. It's faster to develop but ultimately (and as you correctly ask about) doesn't scale well. Incremental updates are the way to go. It will take longer to develop, but your nightly run times will be stable instead of incrementally increasing until they reach a point where it's unsustainable.

    I won't disagree with you outright, Ron, for the simple fact that if you want a scalable solution you're dead on target. That said, most folks aren't working against huge data, and overdevelopment is overkill.

    An example: A current warehouse I have starts with a Truncate/Reload component. This has a strict limit. Any table over 1,000,000 records OR longer than a 10 minute load time (they're concurrent) falls out to a delta component for the next sprint. That's where we decided to implement our sanity check.

    You have to work against expectations of the developer vs. expectations of the development. Good enough is, in most cases, good enough. Smaller shops, like the one the OP is involved in, will probably only need SCD2 type work in very rare instances. In most cases, I find that using a delta for every component is indeed overdeveloping the expectations. I personally use a 10 year rule. If someone will have to go in and modify my code for data expansion expectations within the next 10 years to deal with volume vs. hardware upgrade expectations, I overdevelop to allow for the expansion. Otherwise: KISS. 10 years is (usually) more than enough to allow for business changes to already have redeveloped any particular component twice, and it can be dealt with then should optimization be considered more valuable than development time.

    When in doubt, XKCD is my best friend for this: http://xkcd.com/1205/

    As I said to client one day: "Sometimes, you don't hire more IT, you hire a secretary."


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Both good points well made. first one answers my question on scaleability but also second one explains why the ETL's ive seen before do a drop of the DW first (the OLTP store was not massive).

  • ps_vbdev (1/8/2014)


    Both good points well made. first one answers my question on scaleability but also second one explains why the ETL's ive seen before do a drop of the DW first (the OLTP store was not massive).

    I drop my data marts on a nightly basis and recreate them because of the reporting needs. As for the data warehouse, it's always in increments so it can scale well as the business grows.

    Personally, I was just like you facing the similar problems as a new guy to BI. I walked into one ugly mess of a system where reports were being generated off the OLTP production database. One of the first things that I started to do was learn the data and started to build my data warehouse. It was a real challenge at first, but I'm proud of the results.

  • xsevensinzx (1/27/2014)


    ps_vbdev (1/8/2014)


    Both good points well made. first one answers my question on scaleability but also second one explains why the ETL's ive seen before do a drop of the DW first (the OLTP store was not massive).

    I drop my data marts on a nightly basis and recreate them because of the reporting needs. As for the data warehouse, it's always in increments so it can scale well as the business grows.

    Personally, I was just like you facing the similar problems as a new guy to BI. I walked into one ugly mess of a system where reports were being generated off the OLTP production database. One of the first things that I started to do was learn the data and started to build my data warehouse. It was a real challenge at first, but I'm proud of the results.

    id be interested to know how you structure your BI solution. I take it you have an ETL that dose incremental loads into your DW and then another ETL that drops an creates the datamarts from the DW?

    thanks a lot

  • The data warehouse is like any other traditional warehouse. But, the data marts have to be rebuilt nightly due to the business needs for reporting. Unfortunately, not every company has the same reporting needs. The company I work for does intense attribution based on time. That means, as new data comes into the warehouse, then existing data has to be rebuilt around the new data that comes into the data mart.

    The easiest way to explain this is that our data is split between sales and non-sales. Reporting is done on either one or the other. Because of the intense computing need to transform, conform and load the data, data marts are used to look at chunks of the data by the end user. And for that, those chunks are rebuilt every night.

    Why they are rebuilt is based on the fact that non-sales could eventually turn to sales. Every record in our environment relates to the other eventually. Therefore, we group and reclassify these records when this transformation happens. Due to that, intense backlog of older records are needed to help with the transformation of new records.

    This can be done without a rebuild, but the entire data mart is reorganized when each set of new data. There are also retention rules in place where data has to be cycled out. As these are chunks and not the full data set, it's just easier to rebuild those chunks along with the indexes and everything else in a complete batch process over night. That way the cycle is fully controlled, fresh and ready to go every morning.

    Then the team can aggregate the final results.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply