Database Design & Vertical Partitioning

  • Hi All,

    I have a design query

    I have data of an entity which can be business logically divided on basis of some details. The raw description of the details

    is as follows:

    1 column primary key int 40 columns tinyint 30 columns varchar(100) 4 columns varchar(1000) 10 datetime columns

    now, apart from insertion and updation, major chunk of my query (approx 50%) will be select query which will be based on 15 tinyint columns & 4 datetime columns and will be showing 1 primary key column, 10 out of 15 tinyint column, 2 out of 4 datetime column, 10 varchar(100) column and 1 varchar(1000) column.

    I also have some queries which get some columns of the row corresponding to primary key specified. There are various queries like this fetching various different data based on business logical division.

    I am using stored procedure to do all the required operations including select, insert and update.

    Option 1 = Should i be vertically partitioning the table and then using Inner joins in the select query? Major Cost Factor = Inner Joins Alternates which may help = Make view of all the columns in all the tables required in the major select query. Don't know its impact on Insert, Update etc made on the base table. Insert and update won't be as frequent as select.

    Option 2 = Should i be keeping all the details in 1 table and then doing the search on that? Concerns 1) What will be affect of complete table when the query asks for say 4 columns for a particular primary key as compared to the first option? 2) What will be affect of the non-used columns on the query running 50% of times? 3) What will be affect of the heavy varchar columns on the query?

    Option 3 = Should i be keeping all the details required for 50% times select in 1 table and rest in other and then work on that? Concerns 1) It will make insertion, updation and selection of business logically data difficult as now they will be divided into 2 tables and everytime, a join will be required to get details of a logical division for a particular primary key

  • Is this table part of an OLTP or a DSS system?

    How large is the table today? number of rows, please.

    How large will be the table one year from now? number of rows, please.

    What's the % of daily inserts against this table?

    What's the % of daily updates against this table?

    Is there any purge process in place - or do you foresee to have a purge process for this table?

    In the affirmative case, how are purge candidate rows identified?

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Hi Pablo

    The data is mainly about User. His personal, professional attributes etc. I am not sure what will this categorize under.

    The table currently has about 1 Lakh records.

    1 year down the line, we expect it to reach 1.25 Lakh

    About 50 daily inserts today. Expected to go to 200 daily in 1 year.

    As i am updating the Login time in the table, every Login updates the concerned record. There are approximately 1000 logins a day currently. May be if that part is kept in a different table. But please note that i will be needing the Last Login Time in search. Except login, may be 200 rows get updated daily.

    As of now, we don't have a system in place to purge data, but in future, we might be adding it to our system.

    The to-be-purged data will be identified based on the user's profile status (which may be deleted or something) and the date the profile was removed (may be we purge the data 1 year after it's status changed to deleted)

    I would be very greatful if you tell me which option is the best and why.

    I am waiting eagerly for your response.

    Abhisek Rathi

  • I understand 1 lakh = 100,000 rows?

    If yes... that's a small table, no partitioning is needed - as a rule of thumbs I do not even consider partitioning for tables I do not expect to reach 50 million rows.

    Partitioning is a good tool, providing it is used to solve the right issue.

    In general terms you want to partition either to help during querying and/or to help during purging. Partitioning strategy - partition key choice - should match most frequently used predicates to help during querying or match purging strategy criteria to help during purging.

    I would let a 100,000 table sit pretty as a normal table.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • I tend to agree with Paul. Unless you are in the millions of rows range, then it doesn't necessarily help you to partition.

  • Thanks Paul, Steve for your suggestion.

    Just 1 more thing i was wondering. If i have them in 1 table, and i want, say, 10 columns from that table at a time, 12 another column at another time and so on, will there be a difference in the execution memory if i keep all columns in 1 table vs if i break the table in 10 sub tables for each set of columns??

    If yes, how much significant?

    Abhisek Rathi

  • There likely isn't a memory change significantly. As you query the data, it will get put in the cache, regardless of which table it's from. In terms of indexing, if you are querying by different columns, you will want different indexes to handle that.

    The memory used is based on the data pulled back for the query. If there are not indexes, then you might end up going through more data from a table scan if things are in one column, but this typically isn't how you design a database. You design based on what is best for your application, not execution memory. Your columns all look small, so I would tend to stick them in one table, but without knowing more, I can't recommend one or multiple.

  • abhisek_boy_in (5/28/2010)


    Just 1 more thing i was wondering. If i have them in 1 table, and i want, say, 10 columns from that table at a time, 12 another column at another time and so on, will there be a difference in the execution memory if i keep all columns in 1 table vs if i break the table in 10 sub tables for each set of columns??

    The way you model your tables should be driven by the degree of normalization you want to achieve and not by the number of columns a table may include. As a rule of thumbs if your design shows a one-to-one relationship in between two tables chances are those two tables should be just one.

    Remember that DBA creates physical tables to describe entities - remember ER data modeling?

    In general terms I would go for 3NF normalization on a OLTP system while I'll go with some degree of de-normalization a.k.a. dimensional modeling and data redundancy for a DSS system.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply