Working on a massive BI Environment

  • The idea being we will have data coming in from 2 live SQL Servers into a Datawarehouse (pretty much a straight sql db tho).

    From here we need to snapshot/arhive the data on a monthly basis, and then this needs to go through a filter to clean it up, and get extracted into output files.

    My first thoughts are -

    - SQL 2000 to 2005 Transactional Replication putting data into the "Warehouse" from the 2 databases.

    - Database Snapshots set up to take a monthly dump of the data.

    - Data Firewall of sorts in place, through SSIS, that will tidy up the data from the snapshots and output to Extract files.

    Additonally we would probably have SSRS and SSAS running off of the warehouse db.

    Any thoughts please?

    Thanks :hehe:

  • First thought is use SSIS to extract and tidy up the data before it gets into the warehouse. I say this becuase you mention running AS and RS against it. Second you will need tidy data in the warehouse before you build a cube unless you want to analyze junk. GIGO

    If you want to use replication and keep the live servers free of any hits from an ETL then use an intermediate database similar to the warehouse where you can use the replication then setup an ETL to load the real warehouse. Also make sure you have a solid dimensional structure for SSAS, it will make life easier. Remember a warehouse is not just a container for data, it is a highly structured database designed and optimized to facilitate analysis and reporting.

    Last not sure what you mean by extract files maybe for archiving.

    - good luck

  • If you ever have an issue from the warehouse, then you'll have to whack the snapshots and you'll lose them. While they're a nice way to get the snapshot, I'd honestly start extracting out the data from the snapshot into another db and then keep it around if needed (dbw_200810, dbw_200811, etc), or back it up and drop it from the system.

    The above points make sense for me as well.

  • we use db mirroring and run our etl's against a snapshot of the mirror, works pretty well but they both need to be sql05 instances.

    ditto on the datawarehouse design thing, it really needs a methodology applied either Kimaball on Inmon. We've gone for Kimball and it has given us significant design and user benefits. There's a great DWh design and productivity tool we use http://www.wherescape.com/ by Wherescape take a look.

    HTH


    Kindest Regards,

    Martin

  • How does anyone feel of doing this kind of project withotu Enterprise edition and using Standard Edition instead?

    Looking at cost cutting. Obviously I'd lose out the ability to use the Database Snapshots but who is to say there isn't a better method?

    Any thoughts?

  • if you intend on installing analysis services and reporting services as well then i'd go for enterprise, also from memory partitioning and online index rebuilds are ent only features, if you are planning a large implementation then ent is definately the way to go.

    HTH


    Kindest Regards,

    Martin

  • We use standard and we use SSAS. The reason being we don't have the budget, and at this point don't have the need. Combined rows for fact tables between all data marts is around 12 mln, performance is not a problem. To be honest there are many features of enterprise that would be nice, but not having them certainly isn't a show-stopper.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply