SQL Server 2012 VLDB Data Warehouse Design Question

  • Appreciate feedback to this question. I'm working on a design for VLDB that will be in the several TB size range. The load activity will be vary infrequently as the data will be for research purposes only. People will not have to make business critical decision with this data nor will there be any OLTP activity against the data. In fact data queries will be mostly in the form of an export which also will not happen daily. Frequency of the export will most likely be along the lines of bi-monthly or quarterly.

    That said, I'm having a hard time to justify the use of partitioning and even file-groups seem like a stretch in term of producing any real performance benefits. The Cost/Benefit (C/B) breakdown doesn't appear to offer much of a compiling advantage. Yes, the data repository will be large. But I 'm thinking with the right performing storage, CPU and Memory, the export time-to-completion should be fairly reasonable without the use of these strategies.

    Am I missing something in my thought process? Given these requirements should I consider the use of partition or filegroups?

  • Partitioning is mainly for incrementally loading data by say a date column. If the Monthly or quarterly data is new in terms of in the last month time stamped and no update to previously imported data, there is value.

    File Groups are used to separate the files in the database on different LUNs (SAN) or hard drives. That would be your only benefit if you have separate drives or LUNs to place the files in file groups.

    For performance, you might want to look into Indexing strategies. I have seen huge performance improvements on loading the data in a data warehouse where a fact table was completely rebuilt and we dropped the indexes first, truncated the table, loaded the data and added the indexes.

    The table was mostly duplicated from an OLTP system, but the indexes on the OLTP where different than the fact table because the reporting and querying where different in the 2 systems.

    Thomas LeBlanc, MVP Data Platform Consultant

  • Anthony Stephens (12/16/2012)


    Appreciate feedback to this question. I'm working on a design for VLDB that will be in the several TB size range. The load activity will be vary infrequently as the data will be for research purposes only. People will not have to make business critical decision with this data nor will there be any OLTP activity against the data. In fact data queries will be mostly in the form of an export which also will not happen daily. Frequency of the export will most likely be along the lines of bi-monthly or quarterly.

    That said, I'm having a hard time to justify the use of partitioning and even file-groups seem like a stretch in term of producing any real performance benefits. The Cost/Benefit (C/B) breakdown doesn't appear to offer much of a compiling advantage. Yes, the data repository will be large. But I 'm thinking with the right performing storage, CPU and Memory, the export time-to-completion should be fairly reasonable without the use of these strategies.

    Am I missing something in my thought process? Given these requirements should I consider the use of partition or filegroups?

    Table partitioning rarely brings any performance advantages to the table - do not even think on table partitioning if the reason is "performance".

    Different filegroups, as stated on previous post would help IF, and this is a VERY BIG "IF", if tables are distributed in a way that ensures I/O load balancing otherwise, hot table might end up on the same filegroup then on the same LUN therefore defeating the purpose.

    Reading the specifications of the project it looks to me like "performance" is not really a top priority and, if it is, then proper data modeling and physical implementation including a reasonable indexing strategy would do the trick.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • File groups could help if you do frequent data backups on this VLDB where you can back up each devise simultaneously. I don't read that you need to do this.

    ----------------------------------------------------

  • Anthony,

    You are correct in your thinking - I don't see any reason to implement any performance strategies to your data warehouse. It would be a waste of your time given the generous amounts of time between actions.

  • If you are designing a multi-TB database and do not understand all the design issues you need to pay to outside help. Period.

    When processing a database of this size, a 2% difference in performance can result in extra hours of processing. Unless you really know how to fine tune your design to get the last 1% or 2% out of it, you almost certainly will end up with something that is either very slow or is using a few $100k more kit than it needs.

    IMHO you should build a business case for your employer to engage an external consultant to help you with these issues. There are a number of high-reputation organisations in the market that can do this work, but if you are really stuck for names ask your Microsoft Account Manager for advice.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • EdVassie (4/2/2013)


    If you are designing a multi-TB database and do not understand all the design issues you need to pay to outside help. Period.

    When processing a database of this size, a 2% difference in performance can result in extra hours of processing. Unless you really know how to fine tune your design to get the last 1% or 2% out of it, you almost certainly will end up with something that is either very slow or is using a few $100k more kit than it needs.

    IMHO you should build a business case for your employer to engage an external consultant to help you with these issues. There are a number of high-reputation organisations in the market that can do this work, but if you are really stuck for names ask your Microsoft Account Manager for advice.

    The loads will be, by his word, very infrequent. It sounds like he has the luxury of extra hours of processing. I'm a huge proponent of outside consulting, but for all we know the OP has the skills to successfully implement a coherent performance strategy. Why spend money on consultants when the benefits won't be noticed due to the infrequency of loads and flat file extracts? I don't see the cost outweighing the benefits in this particular case.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply