Separate Staging Tables and Data Warehouse into different Databases.

  • Welsh Corgi (3/23/2012)I'm concerned about the time it takes to back up the Database and recover it.

    By definition Staging tables should be truncated and loaded on each ETL cycle and usually loaded with the "new" or "modified" data from the OLTP source therefore they are usually small when compared with the size of the data warehouse.

    How big are those staging tables when compared with the total size of the data warehouse?

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • opc.three (3/23/2012)


    What I often see are separate databases on the same instance, where the staging database is in simple recovery mode and the data warehouse is in full recovery mode. Procs parked in the staging database use cross-database calls to load the data into the data warehouse database after it has been massaged and prepared in tables in the staging database. Different schemas in the same database do not give you the separation necessary to run different recovery modes.

    When I suggested that one of the benefit was the different recovery models as you suggested the response was "This is an architecture preference".

    What do you think of that, nice ha?

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh Corgi (4/4/2012)


    opc.three (3/23/2012)


    What I often see are separate databases on the same instance, where the staging database is in simple recovery mode and the data warehouse is in full recovery mode. Procs parked in the staging database use cross-database calls to load the data into the data warehouse database after it has been massaged and prepared in tables in the staging database. Different schemas in the same database do not give you the separation necessary to run different recovery modes.

    When I suggested that one of the benefit was the different recovery models as you suggested the response was "This is an architecture preference".

    What do you think of that, nice ha?

    It sounds dodgy to me 🙂 What role does the person that made the comment play? Hopefully not one of the primary DBAs on the project.

    That type of comment might get a person through a meeting with non-technical people and make them appear as if they know what they are talking about, but a preference is something that is left to the implementer that has no bearing on functionality. Choosing a recovery model is far outside the domain of a preference in my opinion. Hopefully this person is not making technical decisions for the project. The recovery model will make a difference in disaster recovery planning, and depending on the requirements your hand may be forced into using one recovery model versus another, as is the case with Database Mirroring which will only function if the database is using the FULL recovery model.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • One question, if the data warehouse is using the full recovery model how often is data loaded?

  • Lynn Pettis (4/4/2012)


    One question, if the data warehouse is using the full recovery model how often is data loaded?

    If that was for me, the bulk of the data is loaded at night but there are various lighter-weight loads, one-off updates and inserts happening during the day as well.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (4/4/2012)


    Welsh Corgi (4/4/2012)


    opc.three (3/23/2012)


    What I often see are separate databases on the same instance, where the staging database is in simple recovery mode and the data warehouse is in full recovery mode. Procs parked in the staging database use cross-database calls to load the data into the data warehouse database after it has been massaged and prepared in tables in the staging database. Different schemas in the same database do not give you the separation necessary to run different recovery modes.

    When I suggested that one of the benefit was the different recovery models as you suggested the response was "This is an architecture preference".

    What do you think of that, nice ha?

    It sounds dodgy to me 🙂 What role does the person that made the comment play? Hopefully not one of the primary DBAs on the project.

    That type of comment might get a person through a meeting with non-technical people and make them appear as if they know what they are talking about, but a preference is something that is left to the implementer that has no bearing on functionality. Choosing a recovery model is far outside the domain of a preference in my opinion. Hopefully this person is not making technical decisions for the project. The recovery model will make a difference in disaster recovery planning, and depending on the requirements your hand may be forced into using one recovery model versus another, as is the case with Database Mirroring which will only function if the database is using the FULL recovery model.

    The response was came from an Architect from a company that is tasked with designing the Data Warehouse. He is the only technical person and they have three project managers.

    I'm the only DBA in the Company but my boss who is also project manager is backing me up on this any many other things that are doing and they will have to change or else.

    The same guy that had OPEN QUERY Statement for an Initial load 178 tables, many in the millions. You could not load a 13 million record table with his code in 16.5 hours, it would fail.

    I rewrote in SSIS and I loaded 179,000,00 records in less than 8 hours.

    Now that the initial load is complete, he has not loaded the data into the Data Warehouse.

    He is performing a merge comparing an incremental to the initial load in the staging.:hehe:

    Maybe I'm missing something but I can't believe what I'm seeing.:-)

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Previous employer, and I had nothing to do with the administration of the SQL Server database (or the server it was on), the DW was set to Full Recovery Model. Data loads occurred at night, with a full backup after the loads were done. They couldn't understand why they kept getting a log full error approximately every three weeks.

    Told them I could fix that. After clearing and shrinking the t-log, I set the DW dtabase to the Simple Recovery Model. No more problems with the t-log getting full.

  • Regardless of the recovery model I want the Staging and Data Warehouse to be in separate Databases. I want smaller backups and faster recovery times.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Bulk-Logged Recovery Model is always an option, right?

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh Corgi (4/4/2012)


    Regardless of the recovery model I want the Staging and Data Warehouse to be in separate Databases. I want smaller backups and faster recovery times.

    Not arguing with you, in fact I agree. Problem is you need to convince your powers that be and all we have for you is our own experiences, no formal documentation that supports what is a valid implementation.

  • Lynn Pettis (4/4/2012)


    Welsh Corgi (4/4/2012)


    Regardless of the recovery model I want the Staging and Data Warehouse to be in separate Databases. I want smaller backups and faster recovery times.

    Not arguing with you, in fact I agree. Problem is you need to convince your powers that be and all we have for you is our own experiences, no formal documentation that supports what is a valid implementation.

    Lynn,

    Thanks for the input, I'm just asking. You made a very valid point and you have influenced me.

    I have already convinced my side. My boss told me that we need to revise the list of concerns that I brought up and which see added to and we are going to her boss the CIO and another VP.

    I can convince them but please keep you ideas coming.

    Thanks!:-)

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh Corgi (4/4/2012)


    Bulk-Logged Recovery Model is always an option, right?

    Still need to run t-log backups or you could run out of disk space as the log file keeps growing.

  • Lynn Pettis (4/4/2012)


    Welsh Corgi (4/4/2012)


    Bulk-Logged Recovery Model is always an option, right?

    Still need to run t-log backups or you could run out of disk space as the log file keeps growing.

    Unless you truncate the log at specified intervals.

    I had the problem of the transaction log growing when re-indexing at an old job but I was monitoring at the same time.

    We do have an abundance of disk space but keep giving me any concerns that you may have..

    I requested way more disk space in my specifications than I will ever need and they are dedicated to giving me a dedicated SAN for the Production Data Warehouse Server.Unlike other ventures disk space does not seem like it will be a problem.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh Corgi (4/4/2012)


    Lynn Pettis (4/4/2012)


    Welsh Corgi (4/4/2012)


    Bulk-Logged Recovery Model is always an option, right?

    Still need to run t-log backups or you could run out of disk space as the log file keeps growing.

    Unless you truncate the log at specified intervals.

    I had the problem of the transaction log growing when re-indexing at an old job but I was monitoring at the same time.

    We do have an abundance of disk space but keep giving me any concerns that you may have..

    I requested way more disk space in my specifications than I will ever need and they are dedicated to giving me a dedicated SAN for the Production Data Warehouse Server.Unlike other ventures disk space does not seem like it will be a problem.

    If you are going to truncate the t-log, just set the database to use the simple recovery model.

    Depending on the timing of the loads, you can do full or differential backups.

  • ok, I would probably have the Database in Full Recovery but change it to Simple before the load and change it back to Full then perform a complete Backup.

    That way if something happens during operational hours, I can perform a PIT Recovery Model if needed.

    Keep your input coming. Thanks.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 15 posts - 16 through 30 (of 40 total)

You must be logged in to reply to this topic. Login to reply