Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

Separate Staging Tables and Data Warehouse into different Databases. Expand / Collapse
Author
Message
Posted Friday, March 23, 2012 11:10 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 10:57 PM
Points: 4,240, Visits: 4,290
Until today I never saw a Database that contained both the Staging Tables and Data Warehouse Tables in the same Database.

The Staging Tables are in a different schema than the Data Warehouse Tables.

Maybe I'm missing something but I have to make a solid case that this is a good practice.

Any information would be greatly appreciated.

http://www.bidn.com/blogs/TomLannen/bidn-blog/1277/separating-data-warehouse-tables-from-your-relational-and-staging-tables

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/



For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Post #1271897
Posted Friday, March 23, 2012 11:31 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 4:06 PM
Points: 20,702, Visits: 32,337
Can't really say if it is good or bad, just different. When we started a data warehouse project at a previous employer we set up the Staging and ODS databases separately. We could have just as easily put them in the same the database using schemas to keep things separate.

If this is being done, one thing I would also make sure of is that the staging and data warehouse tables were in different file groups. This would allow for the placing of these groups on different spindles on the server (or SAN) so that they weren't using the same disks.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1271919
Posted Friday, March 23, 2012 11:53 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Friday, September 26, 2014 7:33 AM
Points: 7,107, Visits: 12,660
What I often see are separate databases on the same instance, where the staging database is in simple recovery mode and the data warehouse is in full recovery mode. Procs parked in the staging database use cross-database calls to load the data into the data warehouse database after it has been massaged and prepared in tables in the staging database. Different schemas in the same database do not give you the separation necessary to run different recovery modes.

The big barrier to moving the staging database onto a different instance altogether is the additional overhead of getting that data from the staging database into the data warehouse. Consider a simple cross-database INSERT...SELECT versus a secondary push job using SSIS or some other means. It's a tradeoff when having your staging database on a separate instance between flexibility in terms of scaling your environment and ease of loading your data warehouse.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1271942
Posted Friday, March 23, 2012 11:56 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 10:57 PM
Points: 4,240, Visits: 4,290
Thanks you very much for the information!

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Post #1271948
Posted Friday, March 23, 2012 11:59 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 4:06 PM
Points: 20,702, Visits: 32,337
opc.three (3/23/2012)
What I often see are separate databases on the same instance, where the staging database is in simple recovery mode and the data warehouse is in full recovery mode. Procs parked in the staging database use cross-database calls to load the data into the data warehouse database after it has been massaged and prepared in tables in the staging database. Different schemas in the same database do not give you the separation necessary to run different recovery modes.

The big barrier to moving the staging database onto a different instance altogether is the additional overhead of getting that data from the staging database into the data warehouse. Consider a simple cross-database INSERT...SELECT versus a secondary push job using SSIS or some other means. It's a tradeoff when having your staging database on a separate instance between flexibility in terms of scaling your environment and ease of loading your data warehouse.


I understand the Staging database using the simple recovery model, but curious why the data warehouse is using the full recovery model? Is it due to size and amount of data loaded each time that t-log backups are considered better?

Previous company I worked at that was building a dat warehouse had theirs using the simple recovery model. A full backup was run after the nightly load process was completed.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1271951
Posted Friday, March 23, 2012 12:14 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Friday, September 26, 2014 7:33 AM
Points: 7,107, Visits: 12,660
Lynn Pettis (3/23/2012)
opc.three (3/23/2012)
What I often see are separate databases on the same instance, where the staging database is in simple recovery mode and the data warehouse is in full recovery mode. Procs parked in the staging database use cross-database calls to load the data into the data warehouse database after it has been massaged and prepared in tables in the staging database. Different schemas in the same database do not give you the separation necessary to run different recovery modes.

The big barrier to moving the staging database onto a different instance altogether is the additional overhead of getting that data from the staging database into the data warehouse. Consider a simple cross-database INSERT...SELECT versus a secondary push job using SSIS or some other means. It's a tradeoff when having your staging database on a separate instance between flexibility in terms of scaling your environment and ease of loading your data warehouse.


I understand the Staging database using the simple recovery model, but curious why the data warehouse is using the full recovery model? Is it due to size and amount of data loaded each time that t-log backups are considered better?

Previous company I worked at that was building a dat warehouse had theirs using the simple recovery model. A full backup was run after the nightly load process was completed.

Generically, a staging database where tables are wiped and reloaded with each new incoming feed makes it volatile, i.e. there is little to no value in recovering a previous DB. Whereas to recover a DW to minimize downtime for data clients could be important.

In the first case that comes to mind running the DW in FULL was purely a DR position. Standard operating procedure was for all mission-critical databases, for which the DW database qualified, to be mirrored. As you know mirroring requires the database to be in FULL recovery mode 100% of the time. Staging was classified as volatile so the DR plan for that DB was to restore the last full backup to the secondary server and resume the job schedule. With mirroring, the DW could be brought online by simply failing over to the secondary mirror. Overkill? Maybe, but the shop had a very "one size fits all" mentality.

I could see value in log shipping a DW though, or in being able to restore to a PIT for purposes of isolating a production issue. Are those worth the overhead of running a DW in FULL? Not sure, depends on the shop and their SLAs I guess.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1271962
Posted Friday, March 23, 2012 12:18 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 4:06 PM
Points: 20,702, Visits: 32,337
opc.three (3/23/2012)
Lynn Pettis (3/23/2012)
opc.three (3/23/2012)
What I often see are separate databases on the same instance, where the staging database is in simple recovery mode and the data warehouse is in full recovery mode. Procs parked in the staging database use cross-database calls to load the data into the data warehouse database after it has been massaged and prepared in tables in the staging database. Different schemas in the same database do not give you the separation necessary to run different recovery modes.

The big barrier to moving the staging database onto a different instance altogether is the additional overhead of getting that data from the staging database into the data warehouse. Consider a simple cross-database INSERT...SELECT versus a secondary push job using SSIS or some other means. It's a tradeoff when having your staging database on a separate instance between flexibility in terms of scaling your environment and ease of loading your data warehouse.


I understand the Staging database using the simple recovery model, but curious why the data warehouse is using the full recovery model? Is it due to size and amount of data loaded each time that t-log backups are considered better?

Previous company I worked at that was building a dat warehouse had theirs using the simple recovery model. A full backup was run after the nightly load process was completed.

Generically, a staging database where tables are wiped and reloaded with each new incoming feed makes it volatile, i.e. there is little to no value in recovering a previous DB. Whereas to recover a DW to minimize downtime for data clients could be important.

In the first case that comes to mind running the DW in FULL was purely a DR position. Standard operating procedure was for all mission-critical databases, for which the DW database qualified, to be mirrored. As you know mirroring requires the database to be in FULL recovery mode 100% of the time. Staging was classified as volatile so the DR plan for that DB was to restore the last full backup to the secondary server and resume the job schedule. With mirroring, the DW could be brought online by simply failing over to the secondary mirror. Overkill? Maybe, but the shop had a very "one size fits all" mentality.

I could see value in log shipping a DW though, or in being able to restore to a PIT for purposes of isolating a production issue. Are those worth the overhead of running a DW in FULL? Not sure, depends on the shop and their SLAs I guess.


As soon as I saw DB Mirroring, it makes sense. With that I would also assume that t-log backups are run periodically or after each load to keep the t-log from filling the disk.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1271964
Posted Friday, March 23, 2012 12:25 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Friday, September 26, 2014 7:33 AM
Points: 7,107, Visits: 12,660
Lynn Pettis (3/23/2012)
As soon as I saw DB Mirroring, it makes sense. With that I would also assume that t-log backups are run periodically or after each load to keep the t-log from filling the disk.

I cannot recall specifically, but it was something like every 15 minutes during heavy load times, and every few hours during the times when the DB was being used for reporting. Some lighter data feeds were loaded throughout the day as well.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1271970
Posted Friday, March 23, 2012 1:24 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, January 28, 2014 8:15 AM
Points: 3,065, Visits: 4,639
Welsh Corgi (3/23/2012)
Until today I never saw a Database that contained both the Staging Tables and Data Warehouse Tables in the same Database.


This is common practice in "some" RDBMS other than SQL Server, in most cases even in the same schema relying on naming covention to tell appart each kind of table like; stg_, dim_, fact_, ods_, etc.


_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Post #1272045
Posted Friday, March 23, 2012 2:04 PM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 10:57 PM
Points: 4,240, Visits: 4,290
PaulB-TheOneAndOnly (3/23/2012)
Welsh Corgi (3/23/2012)
Until today I never saw a Database that contained both the Staging Tables and Data Warehouse Tables in the same Database.


This is common practice in "some" RDBMS other than SQL Server, in most cases even in the same schema relying on naming covention to tell appart each kind of table like; stg_, dim_, fact_, ods_, etc.



I'm only concerned about SQL Server at this point but I remeber that in Oracle when you create a new Database you specify whether you want to create an OLTP or an OLAP Database.

Oracle configures certain system options differently depending upon whether it is an OLTP or an OLAP Database.



For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Post #1272076
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse