Multiple Data Files? Why?

  • Hello, 1st post! I'm an MCSE/Network Admin who's gettting a job which is ALL SQL stuff. I've been told to really learn T-SQL and all the non gui stuff as I guess like Networking, it's much stronger.

    I've been reading in my books about giving a database multiple data files. I'm wondering why you'd do so? Can you direct certain data to save in certain files? I don't quite understand why you'd want, "some data over here, some data over there" unless you could control what went where, say for the sake of backup.

    If someone could shed some light on this, I'd really appreciate it!

  • Well, darn, I answered this and when I hit Post Reply the system threw it all out! 

    Anyway, to try to answer your question again.  First, the GUI part isn't all bad, it can help you start to see how to actually write SQL using the drag-and-drop interface in Enterprise Manager.

    Now to answer the question about using multiple tables in ANY database.  Let's suppose that you're working something like customers and their orders.

    We need to gather certain information for each customer, like CustKEY, CustName, CustAddress, CustCity, CustST, CustZIP, CustPhone, CustFAX, CustEMail and any other information we need about our customers.

    We also need to keep track of each Order created on behalf of each Customer, so we have an Orders table with fields CustKey, where we generate an Order Number as CustOrdKEY , OrdDate, OrdTotal, OrdTax and perhaps other info.

    Now, for EACH Order, we need to have OrderLines, where we use the CustOrdKEY, OrdLineKEY, PartNo, Description, Price, OrdQuantity (not necessarily in this order) and other possible info.

    These relationships are one-to-many, where each customer can have many orders and each order can have many orderlines.

    You can probably start to imagine how LARGE your table would be if you had to maintain all this information in ONE table.  Ever worse, imagine how much of a headache you'd have if ONE of your customers changed their phone number!  Set up with multiple tables, ONE phone number change happens for ALL of that customer's orders at the same time.

    This is pretty simplistic, but I hope it helps you understand.


    Butch

  • Oh follow you completely and appreciate the example. My question is about the data files when you create a database in EM. You have the option to create multiple "Data Files," .mdf I think. I was wondering why you'd do that?

  • As far as I know, multiple data files solve mainly problems with storage space. You can have each file on a different drive, so if the database is say 200GB (or you estimate that it will grow to such amount after some time), you can create 2 or 3 files - and then use 2 or 3 disk drives to store them. Besides, it allows you to backup these files separately, and it should also increase the performance (read and write faster, because data can be read/written simultaneously - but that also depends on other factors). My personal opinion (others may disagree) is, that multiple data files are to be considered if the size of the database is > 100GB, or else if you have problems with storage space for data and/or backups. It can also be one of the ways to improve performance in a DB with heavy traffic. Our database is currently some 120GB large and uses just one file - no problems yet, but if it continues to grow, we'll probably have to think about splitting it into several files (which is more complicated that if you do it when creating the DB... so think ahead).

    See also BOL (SQL Server BooksOnLine = SQLS Help) topic "Files and Filegroups" /quotation from BOL : If the primary file can hold all of the data in the database, databases do not need to have secondary data files. Some databases may be large enough to need multiple secondary data files or to use secondary files on separate disk drives to spread data across multiple disks./

    HTH, Vladan

  • V,

    Thanks for the explanation, that helped a lot. I guess I had to learn the hard way with Networking to learn everything at each step of the way, so I'm really trying to get it ALL this time.

    Thanks again for your reply.

  • TSQL

    whoever told you that you have to learn  TSQL to administer MS SQL is either an Oracle bod or never had charge of a multi server installation - most DBAs do not have time to mess around in TSQL from an administrative perspective - it just takes too long and is as all such operations are - prone to error - As a networking guy imagine if you screwed your DHCP server - well you would perhaps re-install it add you reservations etc. and the system over a short period would repair itself. As a DBA you have to work defensively ...... do nothing that is inherently dangerous if there is an alternative. The reason for this is YOU will have to repair it - large and or complex installations can take a long time to repair - and when the crisis has passed and the hunt for the guilty man begins .... you will be candidate number one to walk the plank. USE THE GUI!

    There are two kinds of TSQL that you will use as a DBA - these are called DDL and DML Data Definition Language and Data Manipulation Language. DDL is used to create and to alter and to delete objects (tables, indices, constraints, UDFs SPs etc) unless you have too much time on your hands create them in the GUI and then generate the scripts from within the GUI - I use this as part of the backup process as do most DBAs.

    DML is used to operate on the data , i.e. insert update and delete - there is loads of stuff on the net regarding this - check out The Guru's Guide to TSQL (Addison Wesley) for a look at what a real pro can do with TSQL. If you go into Query Analyser you will find lots of templates to help you get started with DML (actually DDL too)

    Multiple Data Files

    There are loads of reasons why you would want to spread your database accross multiple data files. Cheif amongs them is performance. Data files can reside on different physical disks so you can get a degree of read and write parralellism so the system goes faster. One neat trick is to place large reference tables in a file and mark that file read only - all the administrative overhead as far as the system practically goes away and you can read the reference tables faster as a result. BAckup - if you have a humongous database then you can split that database accross multiple files and use a fie group backup - allowing you to back up in stages a database that you either dont have the capapcity or time to backup. There are other reasons but these are the main ones

    Hope that his has helped you

    Terry Murphy MCDBA

  • T, Thanks for the continued explanation. I get you on the TSQL stuff, and the DHCP example was dead on.

    "One neat trick is to place large reference tables in a file and mark that file read only - all the administrative overhead as far as the system practically goes away and you can read the reference tables faster as a result."

    I'm confused here. Are you able to seperate certain tables to certain files (mdf) all from the same database?

  • You have files and filegroups. Each SQL Server database is assigned at least one file in a primary filegroup, with .mdf extension. You may have up to 256 filegroups. Multiple files can be assigned in a filegroup to allow data to be written across multiple drive resources, e.g. d, e, f... In this case you can't select which drive the table and/or other database objects will be stored. However if you define one or more secondary filegroups, will have a .ndf extension, you can create or move tables or indexes. In the Create table you can select the filegroup to store the table on and you can in fact select a different filegroup to store the indexes on.

    Okay so why would you want multiple filegroups. Well you had one example where you have some read only tables that don't need the same type of backup/recovery efforts because the don't change and it can also improve through put of the data on these tables. Typically in other database engines they suggest you seperate the indexes, a) it improves performance and b) indexes don't really need RAID5 parity as they can be rebuilt from the data.

    Now you might be asking why I might want to put my tables in a seperate filegroup on another drive resourse. Well if you've found that the drive supporting your primary filegroup is overloaded you might want to seperate large tables that are always being joined, e.g. like an invoice header table and the invoice detail or transaction tables. If your not going to use a seperate RAID controller for the new drive set, you might want to see if you can utilize a seperate channel on the current controller so the RAID controller won't become the new bottleneck.

    So if you'll review the create table syntax from Books On-Line you'll see you can select which filegroup a table and infact which filegroup an index is created on. You can also change this in the GUI, especially useful if you forgot to specify you wanted the new indexes on a seperate filegroup. I think most people find they forget that when you create a primary key or unique key constraint that you're actually creating an index at the same time, so we forget to specify the filegroup to save them on.

    Hopefully this helps!

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply