RE: Separate Staging Tables and Data Warehouse into different Databases.

SSC Guru

Points: 442467

March 23, 2012 at 12:18 pm

opc.three (3/23/2012)
Lynn Pettis (3/23/2012)
opc.three (3/23/2012)
What I often see are separate databases on the same instance, where the staging database is in simple recovery mode and the data warehouse is in full recovery mode. Procs parked in the staging database use cross-database calls to load the data into the data warehouse database after it has been massaged and prepared in tables in the staging database. Different schemas in the same database do not give you the separation necessary to run different recovery modes.
The big barrier to moving the staging database onto a different instance altogether is the additional overhead of getting that data from the staging database into the data warehouse. Consider a simple cross-database INSERT...SELECT versus a secondary push job using SSIS or some other means. It's a tradeoff when having your staging database on a separate instance between flexibility in terms of scaling your environment and ease of loading your data warehouse.
I understand the Staging database using the simple recovery model, but curious why the data warehouse is using the full recovery model? Is it due to size and amount of data loaded each time that t-log backups are considered better?
Previous company I worked at that was building a dat warehouse had theirs using the simple recovery model. A full backup was run after the nightly load process was completed.
Generically, a staging database where tables are wiped and reloaded with each new incoming feed makes it volatile, i.e. there is little to no value in recovering a previous DB. Whereas to recover a DW to minimize downtime for data clients could be important.
In the first case that comes to mind running the DW in FULL was purely a DR position. Standard operating procedure was for all mission-critical databases, for which the DW database qualified, to be mirrored. As you know mirroring requires the database to be in FULL recovery mode 100% of the time. Staging was classified as volatile so the DR plan for that DB was to restore the last full backup to the secondary server and resume the job schedule. With mirroring, the DW could be brought online by simply failing over to the secondary mirror. Overkill? Maybe, but the shop had a very "one size fits all" mentality.
I could see value in log shipping a DW though, or in being able to restore to a PIT for purposes of isolating a production issue. Are those worth the overhead of running a DW in FULL? Not sure, depends on the shop and their SLAs I guess.

As soon as I saw DB Mirroring, it makes sense. With that I would also assume that t-log backups are run periodically or after each load to keep the t-log from filling the disk.