At what size db does it make sense to split into multiple files?

  • Tom Leykis (2/13/2008)


    Ok, this will really light your fire :hehe:

    The reason I was asking is becuase it's my database, for my company, that I manage the db for, because I'm anal when it comes to just how insanely far I can go to inprove performance, and I don't want anyone else touching it. This is my own personal fire-breathing dragon, and no one's code but mine gets to swim in the acid pool. The info I've gotten here is very useful and I appreciate it a great deal. Btw: My code is top notch, probably because I learned most everything about how to write proper code on sites like this, from folks such as yourselves. So thanks for that as well. 😛

    Actually, that cools the fire quite a bit because now I know where you're coming from, Tom. I'm actually a bit jealous in that you get to do the whole thing, womb to tomb. I've not had the privilege nor the responsibility of being able to do that for a long time.

    I agree about sites like this, especially this one... I learn something new, everyday... sometimes, it's just to learn when to put the soap box away, sometimes not early enough... 🙂 thanks folks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Bah, I missed the whole war between the trip home, dinner, kids bath, and now. 🙂 How come nobody called to tell me?! Oh...right...why would I let you guys call me? =)

    I have three mantras as a DBA/Developer/Network Engineer.

    1. It can always go faster. You just haven't figured it out yet and someone else may already have. So talk to everyone.

    2. I can't fix broken code that developers say is "working". But, I can make broke code go faster...much faster. =)

    3. Stupidity got us into this mess and stupidity will get us out.

    Point 3 is all about being outside the box. Because there's just some code that makes you turn your head sideways and say, "uhhhhhh...what?" and when you're done spending 2 weeks staring at it and refactored a 1,200 line stored procedure into 5 lines that does exactly the same thing. =)

    Christopher Ford

  • Hello gentelmen,

    I am no where near the expertise level you guys obviously portray , however in my intermediate DBA experience I have also come to realize when you have a large transactional muti-tier subscriber environment the one crucial reason to separate the datafiles in addition to all that have been mentioned is to be proactive in troubleshooting database corruptions. The technique generally used to bcp out and in the data of tables can be very time consuming on large applications. In my experience the database file grew 300GB.

    Assuming your DBCC statements come back with courpted data pages.

    I was curious as to why nobody had commented on that point, I have known this to be a fundamental reason.

    If I am missing something please be nice about it ..hehehe

    😀

  • Tom,

    I guess manageability has several different meanings. To me, having more files, albeit smaller files...does not make a database more "manageable". That's my opinion anyway.

    Being able to quickly answer questions about your databases health is the key to being more "manageable".

    Am I using indexes effectively?

    Am I using space effectively in the tables? (fragmentation)

    Am I using I/O effectively?

    What is my worst performing query/procedure?

    What is my worst contending table?

    Am I distributed properly across the drives available to me?

    Am I using cached plans effectively?

    Am I backing up regularly?

    Does my disaster recovery plan work?

    Is it automated?

    What is my database growth like?

    How many table scans are occuring?

    Are those table scans a bad thing? (they may not be depending on how big the table is)

    In a database, there is always something to improve, but at some point you've got to be happy with it. Until someone publishes some article in SQL Magazine or on this site that makes you go, "Whoa! I can do that?!" =)

    Jeff, I could almost hug you actually...you wrote an article the other day that I have been working a problem for 6 months on trying to solve. Running totas, eg. Triangular Joins. THANK YOU! =) I know, not the post to do it...but since we're on the topic of performance. 🙂

    Ofcourse, since learning from that I now use ROW_NUMBER() to guarantee my row results instead of creating an index. but that conversation is def. not part of this topic. 🙂

    Glad we could help Tom! Well...I hope I helped...sounds like Paul and Jeff got all the good points across. 🙂

    Christopher Ford

  • reshadit (2/13/2008)


    Hello gentelmen,

    I am no where near the expertise level you guys obviously portray , however in my intermediate DBA experience I have also come to realize when you have a large transactional muti-tier subscriber environment the one crucial reason to separate the datafiles in addition to all that have been mentioned is to be proactive in troubleshooting database corruptions. The technique generally used to bcp out and in the data of tables can be very time consuming on large applications. In my experience the database file grew 300GB.

    Assuming your DBCC statements come back with courpted data pages.

    I was curious as to why nobody had commented on that point, I have known this to be a fundamental reason.

    If I am missing something please be nice about it ..hehehe

    😀

    Now you should use SSIS to do that kind of operation. It's faster and safer. 🙂

    However, speaking on SQL 2005 (if you haven't upgraded why not?!), when a database goes suspect due to corrupted data pages, the event is stored in msdb, in the suspect_pages table.

    Corrupted pages are detected when a query needs to read the data or a DBCC CheckDB or CheckTable is being run, Backup or restore, or when the database is being dropped. They can also be detected when running DBCC DBREPAIR.

    You can find the event_type value definitions in BOL if you search for "suspect_pages":

    http://technet.microsoft.com/en-us/library/ms174425.aspx

    Then you would just restore that page from your last backup.

    Christopher Ford

  • Then you would just restore that page from your last backup.

    Wait, you're saying this thing has a backup feature? :w00t:

  • Tom Leykis (2/13/2008)


    Then you would just restore that page from your last backup.

    Wait, you're saying this thing has a backup feature? :w00t:

    Nice. 😎

    I'm saying don't restore the 300GB+ size database. :p Just the 8kb page. :w00t:

    Christopher Ford

  • Erhm...Forgive me for posting incomplete information. =)

    Previously I mentioned restore the corrupted data page...that assumes you backup your log. 😀 Sorry.

    Okay, I should leave this post alone. BYE!

    Christopher Ford

  • Christopher Ford (2/13/2008)


    Jeff, I could almost hug you actually...you wrote an article the other day that I have been working a problem for 6 months on trying to solve. Running totas, eg. Triangular Joins. THANK YOU! =) I know, not the post to do it...but since we're on the topic of performance. 🙂

    Christopher, thanks for the compliment! Tickles me to death when I can help someone out like that. I really appreciate the feedback! No hugs, though... send beer... I'm on an all liquid diet 😛

    I'd really be interested in how those two articles helped... maybe a post on the discussion area for one of the two? Thanks, Christopher.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi,

    I need to configure the Share Point 2010 databases in SQL Server 2008 R2.

    Microsoft recommends the below best practice for creating the database

    Use multiple data files for content databases

    Follow these recommendations for best performance:

    Only create files in the primary filegroup for the database.

    Distribute the files across separate disks.

    The number of data files should be less than or equal to the number of core CPUs. Count dual core processors as two CPUs for this purpose. Count each processor that supports hyper-threading as a single CPU.

    Create data files of equal size.

    We are estimating that our Share point databases going to be heavy loaded databases and want to create number data files equal the number cores.

    But have the SQL Sevrers on Virtual Machine and this VM is allocated with 2 VCPU (Virtual CPU).

    The physical ESX box 2 quad core CPU.

    According VMwar, 4 VCPUs (Virtual CPU) equal 1 Physical CPU. In this case, how many no.of data files we need to create?

    Thanks

  • gmamata7,

    Please post your question into a different thread. It's bad manners to piggyback off of someone else's question.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 11 posts - 31 through 40 (of 40 total)

You must be logged in to reply to this topic. Login to reply