Easy (I think) mdf file size question

  • So is there any advantage to splitting up a data file that is getting close to 2TB if all the files will be on the same drive? Any dangers keeping it this way as long as there is free space?

  • jcourtjrSQL - Tuesday, February 5, 2019 12:18 PM

    So is there any advantage to splitting up a data file that is getting close to 2TB if all the files will be on the same drive? Any dangers keeping it this way as long as there is free space?

    It depends - you'd really want to test if you have the hardware and storage space to do so. Filegroups can give you some more options from an administrative perspective. Otherwise you can get different answers. There is a thread up here on the same subject. The tests that Paul Randal did with multiple files are referenced and I added that link:
    No advantage to multiple data files on SAN?
    Benchmarking: do multiple data files make a difference?

    Sue

  • How about besides performance? Are there reasons a multi TB mdf file a bad idea?

  • jcourtjrSQL - Wednesday, February 6, 2019 10:06 AM

    How about besides performance? Are there reasons a multi TB mdf file a bad idea?

    Usually you'd look more at having multiple filegroups (vs just files) to have more management options such as archiving, read-only filegroups, more options with backup/restore and dbcc commands.

    Sue

  • What problems are you trying to solve? This question, without details, seems like a desperate attempt to fix an issue!😀

    If, for example, you want to speed up your backup window by creating a filegroup for tables that are static, such as lookup tables or archive tables, and a separate filegroup for your transactional tables, then it likely makes sense. 
    But, if you are trying to make things faster, then I may look at a lot of other things first.  

    The answer is really "it depends", and it really depends on what you are trying to accomplish.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • This stems from someone telling me a 1.7TB mdf is a bad idea without convincing data. There is nothing to accomplish other than figuring out what the risk is with having such a large file vs splitting it on the same drive. Seems to be a knee jerk reaction when some people sees such a large file.

  • jcourtjrSQL - Wednesday, February 6, 2019 11:16 AM

    This stems from someone telling me a 1.7TB mdf is a bad idea without convincing data. There is nothing to accomplish other than figuring out what the risk is with having such a large file vs splitting it on the same drive. Seems to be a knee jerk reaction when some people sees such a large file.

    Maybe back in the days of SCSI attached storage it would have been a compelling argument, but there's likely little benefit from doing such a thing with modern SANs and drives.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • jcourtjrSQL - Wednesday, February 6, 2019 11:16 AM

    This stems from someone telling me a 1.7TB mdf is a bad idea without convincing data. There is nothing to accomplish other than figuring out what the risk is with having such a large file vs splitting it on the same drive. Seems to be a knee jerk reaction when some people sees such a large file.

    There's no problem with multi-terabyte data or log files in SQL Server. 
    I work in a very large environment with very busy 100TB+++ databases (SQL2008r2-SQL2016), and big files are fine.

    There are two distinct issues with running everything in a *single* large .mdf file: System metadata tables in the Primary filegroup and allocation contention when adding data to the file.

    1. System metadata and the Primary filegroup
    SQL Server 
    uses itself to manage itself, placing nearly all the metadata about your database (table definitions, procedure code, user permissions, and so on) inside hidden system tables in the Primary filegroup. If all of your user data is also in the Primary filegroup, then that data is scattered across the single huge file. There are some drawback to this, but they were a bigger deal in older editions and when storage was less flexible. The practice of creating new filegroups for all user tables and indexes and managing that space separately from the system space comes from this.
    In the Enterprise Edition of SQL Server, you can bring a database partially online after a restoring only the primary filegroup, then restoring separate filegroups, bringing them online after each filegroup is restored, as a way of making a large restored database available faster in the event of a catastrophic recovery. Before the disaster, you segment data by importance in different filegroups. In a catastrophic scenario, you can get the main business functioning faster, then bring reporting and historical data online after your business resumes. This is known as a 'Piecemeal Restore', and not possible if everything is in a single filegroup.2. Allocation contention
    There is also the issue of allocation contention on busy inserts. If you only ever have a single thread writing data at one time to the database, then this isn't much of an issue. If data comes in from multiple writers, then read on.
    As row and index data is written to the data files, it will only be written to a Data Page - each data file is broken sequentially into 8KiB segments known as Pages - that has been specifically allocated to that segment of that table or index. As tables and indexes grow, they need more pages into which they will write their data. There are a couple special tracking index for availability of these pages to be allocated , called the Global Allocation Map (GAM) and the Shared Global Allocation Map (SGAM). WIthout nerding out too hard on the specifics of these pages, each GAM and SGAM tracks the next ~512,000 data/index pages in the file. A new GAM page and SGAM page will be written at the end of the block the previous ones tracked.
    At 8KiB per page, each GAM and SGAM tracks 4GB of data file. All allocation of new pages to data and indexes need exclusive access to write on those pages. If you have several threads inserting data into a single file, they will collide with each other during allocation. You will see these collisions as PAGELATCH_EX waits on the GAM page. This contention occurs in the SQL instance itself, and is not affected in any way by the type and speed of the physical storage - there will be the same waits for the same length regardless of the file is on local NVME or an iSCSI-attached USB2.0 thumb drive sitting across the WAN on the other side of three encrypted gateways.
    If that filegroup(s) holding the tables and indexes have multiple files with similar free space, those concurrent allocations-and-writes will be spread across the files (a.k.a. Propotional Fill), with separate threads allocating out of the GAMs and SGAMs in the file to which they have been directed. Think of multiple data files as load-balancing your writes. (Note: defrags and rebuilds involve a LOT of allocation and de-allocation on the GAMs, and pile on this problem.)

    Using table partitioning, you can place table rows in different partitions by row age, and different partitions may be in different filegroups. With huge databases, this allows us to keep historical data with a database, but residing in a filegroup that holds rows for a period of time (like putting 2017 order history in a 'Year2017' filegroup. When the year ends and new rows go to a new filegroup, the old filegroup is marked read-only and the filegroup backed up one last time. That backup is protected and retained, and full-database backups moving forward do not need to also include yet another copy of old data that will never change. Smaller backups finish faster, take up less space, and cost less.

    To move user tables and indexes from one filegroup to another, re-create each index you wish to move using CREATE INDEX... WITH DROP_EXISTING ... ON [MyNewFilegroupName]. Yes, you will use CREATE INDEX even though the index already exists. You must use the WITH DROP_EXISTING keyword to use this technique. In Enterprise Edition, you can do this WITH ONLINE=ON as well.

    -Eddie

    Eddie Wuerch
    MCM: SQL

  • Michael L John - Wednesday, February 6, 2019 11:55 AM

    jcourtjrSQL - Wednesday, February 6, 2019 11:16 AM

    This stems from someone telling me a 1.7TB mdf is a bad idea without convincing data. There is nothing to accomplish other than figuring out what the risk is with having such a large file vs splitting it on the same drive. Seems to be a knee jerk reaction when some people sees such a large file.

    Maybe back in the days of SCSI attached storage it would have been a compelling argument, but there's likely little benefit from doing such a thing with modern SANs and drives.

    The only issues I know about are related to how the volume is created in Windows and limitations with VMWare.  If the volume is not created as a GPT disk - then it will be limited to the max size of 2TB for a MBR disk.  In some older versions of VMWare you could have issues with volumes larger than 2TB regardless of how they are configured in Windows.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Despite the apparent knee jerk reactions by some in the face of large MDFs, Eddie is spot on.

    I'll also state that a whole lot of data is "W.O.R.M" (Write Once, Read Many) data that never ever changes again.  For example, almost every database has a gargantuan table or two (or a dozen :D) that is used as an audit table or some other temporal table like an invoice/invoice detail pair of tables in some form.  Such a table is written to in a temporal fashion and, once written, the data is never again updated.  There is high merit in partitioning such a table temporally (and I actually do it in a separate database) with one file group per month and one file per file group.  All but the current and next months' partitions can be set to Read_Only, backup up one final time, and never need to be backed up ever again.  I'll also state that I actually prefer Partitioned Views over Partitioned Tables and that I'll create a database for each file group for such things.  It makes life real easy when you want to copy just a couple of months to a lesser environment for Development or Testing.  It's actually a bitch to pull of with Partitioned Tables.

    There's also the idea of "junk " work tables that every database seems to accumulate.  These tables are usually temporary in nature but someone wanted to leave a trail of breadcrumbs in case a process failed.  That's a great idea but they never go back and clean them up.  You could make a "Scratch" database for such things and there are serious advantages to doing so.  First, you wouldn't back anything up nor even think of rebuilding any indexes there because it's all stuff that is supposed to go away in a relatively short time.  Second, you can set the database to the SIMPLE recovery model to take advantage of "Minimal Logging" and other things.  Think of it as a longer term TempDB.  And, you can almost without fear create a job that will drop anything from that database after the objects are a week or two old because nothing permanent is supposed to be stored there.

    If you can guarantee that you can setup file groups on physically different spindles for spinning hard disks, there's also a performance advantage in having Clustered Indexes (the data itself) in one file group and the non-Clustered Indexes in another.

    Of course, anything that you can set to Read_only (as described above) also means you don't have to backup that data after just one backup.  That can also go for large reference tables that aren't written to.

    There's more (like in the areas of Index Maintenance, etc) but that and Eddie's great post should be enough to wet anyone's whistle on the advantages of splitting things up.  For those not yet at the 2TB mark, now's the time to plan on such scalable notions.  Just don't think for a minute that partitioning will help with performance.  It can if you have junk queries that end up doing a scan but so would fixing the queries and you'd get a bigger bang for the buck.  But partitioning and all the rest that I've mentioned can really help with backups, piecemeal restores, index maintenance, and more.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eddie Wuerch - Wednesday, February 6, 2019 12:34 PM

    jcourtjrSQL - Wednesday, February 6, 2019 11:16 AM

    This stems from someone telling me a 1.7TB mdf is a bad idea without convincing data. There is nothing to accomplish other than figuring out what the risk is with having such a large file vs splitting it on the same drive. Seems to be a knee jerk reaction when some people sees such a large file.

    There's no problem with multi-terabyte data or log files in SQL Server. 
    I work in a very large environment with very busy 100TB+++ databases (SQL2008r2-SQL2016), and big files are fine.

    There are two distinct issues with running everything in a *single* large .mdf file: System metadata tables in the Primary filegroup and allocation contention when adding data to the file.

    1. System metadata and the Primary filegroup
    SQL Server 
    uses itself to manage itself, placing nearly all the metadata about your database (table definitions, procedure code, user permissions, and so on) inside hidden system tables in the Primary filegroup. If all of your user data is also in the Primary filegroup, then that data is scattered across the single huge file. There are some drawback to this, but they were a bigger deal in older editions and when storage was less flexible. The practice of creating new filegroups for all user tables and indexes and managing that space separately from the system space comes from this.
    In the Enterprise Edition of SQL Server, you can bring a database partially online after a restoring only the primary filegroup, then restoring separate filegroups, bringing them online after each filegroup is restored, as a way of making a large restored database available faster in the event of a catastrophic recovery. Before the disaster, you segment data by importance in different filegroups. In a catastrophic scenario, you can get the main business functioning faster, then bring reporting and historical data online after your business resumes. This is known as a 'Piecemeal Restore', and not possible if everything is in a single filegroup.2. Allocation contention
    There is also the issue of allocation contention on busy inserts. If you only ever have a single thread writing data at one time to the database, then this isn't much of an issue. If data comes in from multiple writers, then read on.
    As row and index data is written to the data files, it will only be written to a Data Page - each data file is broken sequentially into 8KiB segments known as Pages - that has been specifically allocated to that segment of that table or index. As tables and indexes grow, they need more pages into which they will write their data. There are a couple special tracking index for availability of these pages to be allocated , called the Global Allocation Map (GAM) and the Shared Global Allocation Map (SGAM). WIthout nerding out too hard on the specifics of these pages, each GAM and SGAM tracks the next ~512,000 data/index pages in the file. A new GAM page and SGAM page will be written at the end of the block the previous ones tracked.
    At 8KiB per page, each GAM and SGAM tracks 4GB of data file. All allocation of new pages to data and indexes need exclusive access to write on those pages. If you have several threads inserting data into a single file, they will collide with each other during allocation. You will see these collisions as PAGELATCH_EX waits on the GAM page. This contention occurs in the SQL instance itself, and is not affected in any way by the type and speed of the physical storage - there will be the same waits for the same length regardless of the file is on local NVME or an iSCSI-attached USB2.0 thumb drive sitting across the WAN on the other side of three encrypted gateways.
    If that filegroup(s) holding the tables and indexes have multiple files with similar free space, those concurrent allocations-and-writes will be spread across the files (a.k.a. Propotional Fill), with separate threads allocating out of the GAMs and SGAMs in the file to which they have been directed. Think of multiple data files as load-balancing your writes. (Note: defrags and rebuilds involve a LOT of allocation and de-allocation on the GAMs, and pile on this problem.)

    Using table partitioning, you can place table rows in different partitions by row age, and different partitions may be in different filegroups. With huge databases, this allows us to keep historical data with a database, but residing in a filegroup that holds rows for a period of time (like putting 2017 order history in a 'Year2017' filegroup. When the year ends and new rows go to a new filegroup, the old filegroup is marked read-only and the filegroup backed up one last time. That backup is protected and retained, and full-database backups moving forward do not need to also include yet another copy of old data that will never change. Smaller backups finish faster, take up less space, and cost less.

    To move user tables and indexes from one filegroup to another, re-create each index you wish to move using CREATE INDEX... WITH DROP_EXISTING ... ON [MyNewFilegroupName]. Yes, you will use CREATE INDEX even though the index already exists. You must use the WITH DROP_EXISTING keyword to use this technique. In Enterprise Edition, you can do this WITH ONLINE=ON as well.

    -Eddie

    thank you.

    one naive thought...I thought allocation contention was only applicable to tempdb, where the recommendation is to have a certain per file per cpu. I am still absorbing your post as I its a little over my head.

  • jcourtjrSQL - Friday, February 8, 2019 9:24 AM

    one naive thought...I thought allocation contention was only applicable to tempdb, where the recommendation is to have a certain per file per cpu. I am still absorbing your post as I its a little over my head.

    It's the same problem, just in user databases. The situation is that 'usual' tempdb use creates lots of tables, writes data into them, and drops them. 'Normal' user database activity is traditionally tilted way more toward reads over writes. This kind of contention sticks out in tempdb, but can be buried when part of user queries, even though it's just as bad. If your database clients create and drop lots of tables in the user database, they are repeating those tempdb-like patterns in your user DB.

    The answer to your question is another 'it depends' answer: it depends on the amount of write activity and count of simultaneous threads writing into the database. If you're not writing to the file with more than two threads at any time, regardless of readers, you won't get much lift from spreading the load across many files.

    The general recommendations in the wild for tempdb file count vs core count are based on the GAM/SGAM stuff I posted earlier, and statements like '1 tempdb file per 2 CPUs' (or whatever it is) are far easier to explain, yet still provide acceptable performance in the general case. TempDB suffers more because the first few pages allocated to new tables come from Mixed Extents (unless running with -T1118), which have a lot more overhead in allocation (GAM, SGAM, and PFS page updates required) than all other pages added to a table after the first few, which come from Uniform Extents (only requiring GAM updates to allocate new). 

    -Eddie

    Eddie Wuerch
    MCM: SQL

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply