Drawbacks of having hundreds of data files per database

  • Jeffrey Williams wrote:

    Jeff Moden wrote:

    Jeffrey Williams wrote:

    ScottPletcher wrote:

    Actually, the development team should have nothing to do with how the data in the db is physically stored.  That's the DBA's job.  (Sorry, just had to say that first.)

    Agree with this - developers shouldn't be designing storage solutions for SQL Server.

    Btw, a single data file is not best for performance (it might be OK if you have very fast SAN/use SSDs).  Typically for large amounts of data, you'd want at least two data files.  Of course doubling the data files would just add to your issues here.

    From the original post - it sounds like they are wanting to create a new filegroup and file for each day.  Unless they are creating those on separate volumes (mount points) then there isn't going to be any benefits to multiple files.  In fact, it is likely they will be creating the files with a minimal size and growing them out - which is going to cause file fragmentation which could cause even worse performance.

    I am curious why you state a single file is not best for performance - whether you have one file or several files you won't see any difference in performance unless you spread those files across different volumes that are also on separate spindles.  And if using an Enterprise SAN - then multiple files on the same volume only makes maintenance of the system harder.  Separate files across volumes on an Enterprise SAN can improve performance - maybe.  Depends on the HBA, the fabric, switches and SAN configuration and data access patterns.

    I have to say that I totally disagree with that.  Stop treating Developers as if they shouldn't know even the basics of SQL.  You're just making the problem worse and you're turning the DBA into a major bottleneck, which the Developers and others rightfully complain about because of statements like Developers not needing be involved with datatypes.

    And I have to say that is incredibly rude - I don't treat developers that way.  I do not expect any developer to understand mount points or how the SAN is configured - or what drive letters we are using for each database file, or even how many files are included in each file group.

    If a developer shows interest in the subject I am more than happy to sit with them and show them, but I don't expect them to know anything about these subjects.  Their job doesn't require them to know that and they shouldn't be responsible for laying out the storage for SQL Server.  That is the DBA's responsibility...

    Sorry but no.... what I said was based on what was said.  If you don't treat Developers that way, that's great but you agreed with what was originally said. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden wrote:

    Jeffrey Williams wrote:

    Jeff Moden wrote:

    Jeffrey Williams wrote:

    ScottPletcher wrote:

    Actually, the development team should have nothing to do with how the data in the db is physically stored.  That's the DBA's job.  (Sorry, just had to say that first.)

    Agree with this - developers shouldn't be designing storage solutions for SQL Server.

    Btw, a single data file is not best for performance (it might be OK if you have very fast SAN/use SSDs).  Typically for large amounts of data, you'd want at least two data files.  Of course doubling the data files would just add to your issues here.

    From the original post - it sounds like they are wanting to create a new filegroup and file for each day.  Unless they are creating those on separate volumes (mount points) then there isn't going to be any benefits to multiple files.  In fact, it is likely they will be creating the files with a minimal size and growing them out - which is going to cause file fragmentation which could cause even worse performance.

    I am curious why you state a single file is not best for performance - whether you have one file or several files you won't see any difference in performance unless you spread those files across different volumes that are also on separate spindles.  And if using an Enterprise SAN - then multiple files on the same volume only makes maintenance of the system harder.  Separate files across volumes on an Enterprise SAN can improve performance - maybe.  Depends on the HBA, the fabric, switches and SAN configuration and data access patterns.

    I have to say that I totally disagree with that.  Stop treating Developers as if they shouldn't know even the basics of SQL.  You're just making the problem worse and you're turning the DBA into a major bottleneck, which the Developers and others rightfully complain about because of statements like Developers not needing be involved with datatypes.

    And I have to say that is incredibly rude - I don't treat developers that way.  I do not expect any developer to understand mount points or how the SAN is configured - or what drive letters we are using for each database file, or even how many files are included in each file group.

    If a developer shows interest in the subject I am more than happy to sit with them and show them, but I don't expect them to know anything about these subjects.  Their job doesn't require them to know that and they shouldn't be responsible for laying out the storage for SQL Server.  That is the DBA's responsibility...

    Sorry but no.... what I said was based on what was said.  If you don't treat Developers that way, that's great but you agreed with what was originally said. 😉

    You were attacking a straw man.  I never made anything close to the claim you said I did.

    As Jeffrey noted, I was talking about the details of physical files in SQL Server.  Of course I expect developers to know about data types, I mean, c'mon, that was just a silly claim that I would ever mean that.  In fact, I wish many of them did know more, since for some I have to continually remind them that you don't need a bigint to store, for example, a shipment qty total!

    For me, if a developer shows interest in the subject, I ask them to check with their manager if it is ok for them to spend time on that.  Developers always have tight schedules, and I won't unilaterally be part of redirecting their time to an indirect topic unless their manager agrees with it.  You may think I'm wrong there, if so, then so be it.  But to me that's the most proper approach.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • except for the bickering ... did - did not ...

    Isn't there also a ratio processor/core(s) vs io chanels vs io threads to keep in mind?

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ScottPletcher wrote:

    You were attacking a straw man.  I never made anything close to the claim you said I did./quote]

    Actually, you did...

    ScottPletcher wrote:

    Actually, the development team should have nothing to do with how the data in the db is physically stored.  That's the DBA's job.  (Sorry, just had to say that first.)

    To me, using the correct datatype is a major part of "how the data in the db is physically stored".  The Developers should know things like why you should not store GUIDs as NVARCHAR(72), why not everything that looks like an integer should be stored as a NUMERIC(18,0), why dates should not be stored as integers in the ISO format, and why things that will only ever store a single byte status shouldn't be stored as and NVARCHAR(256). 😀

    Yes, I agree... none of that probably falls under what you were talking about but people reading that out of context (and many will) will use it as justification for perpetuating such a mistake and then blame the DBA for being a "bottleneck" all while quoting Knuth's frequently misinterpreted parable as justification.

    I'll also say that good Developers frequently not only know the expected scope of data, they also know the expected scale of data  I have no issues with a Developer identifying the nature of the data in those terms and suggesting that some form of partitioning would probably be appropriate.  That's a part of what "DevOps" is supposed to be about. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden wrote:

    ScottPletcher wrote:

    You were attacking a straw man.  I never made anything close to the claim you said I did.

    Actually, you did...

    ScottPletcher wrote:

    Actually, the development team should have nothing to do with how the data in the db is physically stored.  That's the DBA's job.  (Sorry, just had to say that first.)

    To me, using the correct datatype is a major part of "how the data in the db is physically stored".  The Developers should know things like why you should not store GUIDs as NVARCHAR(72), why not everything that looks like an integer should be stored as a NUMERIC(18,0), why dates should not be stored as integers in the ISO format, and why things that will only ever store a single byte status shouldn't be stored as and NVARCHAR(256). 😀

    Yes, I agree... none of that probably falls under what you were talking about but people reading that out of context (and many will) will use it as justification for perpetuating such a mistake and then blame the DBA for being a "bottleneck" all while quoting Knuth's frequently misinterpreted parable as justification.

    I'll also say that good Developers frequently not only know the expected scope of data, they also know the expected scale of data  I have no issues with a Developer identifying the nature of the data in those terms and suggesting that some form of partitioning would probably be appropriate.  That's a part of what "DevOps" is supposed to be about. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • ScottPletcher wrote:

    You were attacking a straw man.  I never made anything close to the claim you said I did.

    Actually, you did...

    ScottPletcher wrote:

    Actually, the development team should have nothing to do with how the data in the db is physically stored.  That's the DBA's job.  (Sorry, just had to say that first.)

    To me, using the correct datatype is a major part of "how the data in the db is physically stored".  The Developers should know things like why you should not store GUIDs as NVARCHAR(72), why not everything that looks like an integer should be stored as a NUMERIC(18,0), why dates should not be stored as integers in the ISO format, and why things that will only ever store a single byte status shouldn't be stored as and NVARCHAR(256). 😀

    Yes, I agree... none of that probably falls under what you were talking about but people reading that out of context (and many will) will use it as justification for perpetuating such a mistake and then blame the DBA for being a "bottleneck" all while quoting Knuth's frequently misinterpreted parable as justification.

    I'll also say that good Developers frequently not only know the expected scope of data, they also know the expected scale of data  I have no issues with a Developer identifying the nature of the data in those terms and suggesting that some form of partitioning would probably be appropriate.  That's a part of what "DevOps" is supposed to be about. :D[/quote][/quote]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply