Table design considerations help

  • Hi All,

    This is a general question regarding database design. I am not a db designer nor an architect. I am sql/ETL developer.
    However, curios to know about this as some of such related work will be falling in my bucket.Looking for some guidance.

    Assume that we have existing Data model. As and when new requirements comes, we keep adding new tables or adding columns to existing tables and work in Sprint model.
    Similarly, we got some new requirements and now the question arises, can we add additional columns to the existing table or consider creating a new table with same structure with new columns as per requirement(if any).

    What type of questions do I need to ask myself/architect's/business users/dba's if I want to store data in the same table or else consider creating a new table to created with similar structure for new requirements. Is there any thing to be done with capacity planning??

    So, If I have some existing model and is currenlty meeting business requirements and tomorrow if I get a similar kind of functionality requirement, why can't we fit into the same table and why someone would prefer having a separate table(s) instead of using the existing db structures/tables.
    Basically, I am looking for pros & cons of having data stored in the same table or else is it good to have my data in a separate table for the new similar kind of business requirement? Doing so (i.e. having a new table), Does I/o changes ? locking/blocking/deadlocks gets eliminated? any performance gains of having data in separate table like query performance gains, any database growth considerations need to be cosidered (if so what questions to be asked to busines users)?

    So, as a whole, what are the things to be considered to have an existing table and storing data into same or creating a new table for the new requiement. Which is better??

    FYI its a SQL Azure PaaS db with one .mdf and one .ldf file.

    Kindly advice. Please don't say. it depends or hire a consultant. I am really looking for some generous inputs so that I can involve the right people on this. Right now, everything is in AIR.

    @Grant Fritchey Sir, Can you pl provide some suggestions pl?

    Thanks,

    Sam

  • Any thoughts pl

  • Yeah.  My thoughts are that you've been around long enough to know that that's not the way things work around here.

    Database design is too complex a topic for a succinct response to an extremely vague post, especially when you've preemptively excluded the only possible response, "It depends."

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • If they are new columns and your existing data will not be filled with valid data, then I would opt for creating a new table.  If for your existing data you can determine the new columns value, then use existing table.
    I think if your tables get too wide it can cause some performance issues, at least I've seen some that have. I'm talking 100+ columns on the table.
    If you are constantly having to add new columns maybe a 'stacked' table design would be best.  Could eventually see some performance issues with this design t if not properly indexed.  This way you don't need to change the table with each new column.  And existing code that references that table doesn't need to be changed when new data is added.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • I am with Drew.  Such a vague question and the only correct answer is "it depends."  You and the people you work with are the ones that need to discuss this and develop possible solutions along with the pros and cons of each since you are the ones that can see what you are trying to accomplish.

    Anything you get from us will just be wild shots in the dark while blind folded.

  • vsamantha35 - Thursday, July 12, 2018 4:33 AM

    Hi All,

    This is a general question regarding database design. I am not a db designer nor an architect. I am sql/ETL developer.
    However, curios to know about this as some of such related work will be falling in my bucket.Looking for some guidance.

    Assume that we have existing Data model. As and when new requirements comes, we keep adding new tables or adding columns to existing tables and work in Sprint model.
    Similarly, we got some new requirements and now the question arises, can we add additional columns to the existing table or consider creating a new table with same structure with new columns as per requirement(if any).

    What type of questions do I need to ask myself/architect's/business users/dba's if I want to store data in the same table or else consider creating a new table to created with similar structure for new requirements. Is there any thing to be done with capacity planning??

    So, If I have some existing model and is currenlty meeting business requirements and tomorrow if I get a similar kind of functionality requirement, why can't we fit into the same table and why someone would prefer having a separate table(s) instead of using the existing db structures/tables.
    Basically, I am looking for pros & cons of having data stored in the same table or else is it good to have my data in a separate table for the new similar kind of business requirement? Doing so (i.e. having a new table), Does I/o changes ? locking/blocking/deadlocks gets eliminated? any performance gains of having data in separate table like query performance gains, any database growth considerations need to be cosidered (if so what questions to be asked to busines users)?

    So, as a whole, what are the things to be considered to have an existing table and storing data into same or creating a new table for the new requiement. Which is better??

    FYI its a SQL Azure PaaS db with one .mdf and one .ldf file.

    Kindly advice. Please don't say. it depends or hire a consultant. I am really looking for some generous inputs so that I can involve the right people on this. Right now, everything is in AIR.

    @Grant Fritchey Sir, Can you pl provide some suggestions pl?

    Thanks,

    Sam

    Your questions are akin to "tell me everything I need to know to become a database designer" and it's just not possible for someone to tell you everything you need to consider even for just the questions that you've asked.

    I will tell you that if you think a table needs to be split into more than one table to achieve high levels of performance, then you need to go back and study the design of the table, the code that will use it, and the indexes that support it because proper code is usually the only way to achieve performance rather than such base trickery, which can also cause more problems than just bad code..  If you tell me you want to split the table to make table/index maintenance easier, then possibly.  Sorry to say it but, truly, "It Depends".

    I'll also state that capacity planning has nothing to do with whether or not you split a table or not.  Only knowing the current and potential growth of the data will help you there.  Again, "It Depends".  After you determine that, you might want to reconsider the growth (in Megabytes, of course, and never percent) of the MDF/NDF files and the Log file.  For the log file, you should consider finding out the number of VLF files it currently contains and, possibly, make a change to that and the growth settings of the log file.  Again, "It Depends".

    Also, for individual tables that are likely to grow, you need to figure out the growth pattern for the indexes.  For example, will the Clustered Index only suffer INSERTs and non-expAnsive or is it rife with variable with columns that will expand when updated, which will kill the whole reason for "ever increasing keys", which you should also consider for the Clustered Index.   Do you have any BLOB columns that may "accidently" fit data in row, which may blow any Fill Factor settings out of the water when it comes to bad page splits (and the resulting bad performance)? Should your blobs be forced out of row even if the fit the row?  Should you apply a length of spaces as a default to variable length columns to prevent the eventual expAnsive growth?   Of course, all of that can affect your maintenance plans for the non-clustered indexes and so, once again, "It Depends".

    You also need an archive plan.  That plan may be to "keep all data in the table forever" but that's normally not a good plan.  At least consider proper portioning to support the idea of making older, now static data reside on read only file groups/files so that you don't have to continuously backup that which will never change.  Should it be Partitioned Views or Partitioned Tables?  Again, "It Depends" on things like will you ever need to make a "short copy" of the database using restores so that you can keep just a couple of months of data for testing instead of loading a full terabyte of data in a database? 

    And I'm not even scratching the surface here because... "It Depends". 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply