Table Splitting

  • Hi All,

    I have 28+ million table with 30 columns. Now we want to split into

    multiple tables.

    It having the information

    like name info, address info , company info, employees, sales, webaddress..

    This having Rowid as identity data type and

    email having primary key.

    daily we want all these columns data.

    Please suggest us the better way to split the table.

    B'coz it hurts the performance of querying process.

  • Without knowing more about the structure and what exactly is performing badly, it's going to be very hard to suggest solutions. Is the performance bad on inserts or on reads? If on reads, have you examined the execution plans to ensure that your indexes are being properly used? Are you maintaining the indexes and statistics?

    If on inserts, it sounds like you might need to come up with a more normalized storage mechanism. That's a big topic. Again, it will be hard to advise you on what to do precisely in a post on a news group especially without adequate information. I would suggest getting more knowledgeable on general database design. Check out an old book, Handbook of Relational Database Design by Fleming and von Halle. That should move you in the right direction.

    If you need more help, you'll need to post specifics. What is the existing structure? Where are your performance bottlenecks? That sort of thing.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • DBTeam (3/22/2010)


    Hi All,

    I have 28+ million table with 30 columns. Now we want to split into

    multiple tables.

    It having the information

    like name info, address info , company info, employees, sales, webaddress..

    This having Rowid as identity data type and

    email having primary key.

    daily we want all these columns data.

    Please suggest us the better way to split the table.

    B'coz it hurts the performance of querying process.

    Are you importing this data into some form of staging table daily or ???

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • RedGate's SQL Refactor can do table splitting for you, but it might have trouble biting off that many rows all at once.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Not daily ,monthly 5 lakhs we are inserting into it with the insert statement.

    Daily we need to pull the records from that 20Million table.

    Table is keep on increasing like this.now we want to split the table.

    No idea how to do that.

    How is that if we split vertically?(based on ID's and foreign keys)

    But dalily we need all the columns data.

  • If you're seeking to break it up in that way, then instead of ID & foreign key, maybe breaking it up by date makes more sense.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • we do not have Date column

  • Then possibly adding it might be in order. You've pretty much said that the access is by date. If you're principal access is by date, but you split it by foreign key, then instead of isolating the reads, you're going to spread them, making them less efficient instead of more efficient.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I did not say that accessing by date.

    I mentioned that there is a possibility of adding around 5 lakhs per month.

    Only id that is identitity datatype is there

    remaining all are text columns.

  • I'm sorry, I'm not there and I don't know your structure. When you said this earlier:

    But dalily we need all the columns data.

    I thought that meant that you needed to access data by time.

    I still don't understand where you are seeing slow performance, and without that knowledge and full knowledge of your structure, suggesting how you can partition this data is not really possible.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Again i'm explaining my exact problem

    Hi All,

    Splitting a Large table

    We have a 30 million + table with 30 fields

    as follows

    ID

    name

    job

    company

    address

    city

    state

    zip

    phone

    fax

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Email

    like this we have 30 fields

    In the above ID column is Identity datatype and email is clustered Primary key.

    Mainting large table is becoming difficult for us.

    Planning to split this table

    Please suggest us the best way of how can be done this..

    No front end access for this.

    Do i need to go for Normalization

  • The problem is you're explaining the table and it's size and that you have a problem, but you're not describing what the problem is. Are you having trouble on inserts? Are you having trouble on reads? Both? What kind of queries are you running against it? Do you have execution plans?

    In general, for most applications, I'd say yes, you absolutely should normalize your data unless it's a reporting system, in which case you should look at setting up a star schema or one some other type of BI oriented storage architecture. Either way, simply slapping 30 fields up with all sorts of duplicate data (not to mention, the complete lack of integrity, how many different spellings do you have in the state, country or city columns for any given location?) is a poor choice for storage.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Give an example of some queries you're running against the data and it would be much easier to get an understanding of what improvements would help.

    Unless your business is selling spam email lists, I can almost guarantee you would be much better off picking a clustered index other than the email field. Potentially, depending on what queries you're running, you could see huge performance gains just by fixing that.

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • just my 2ct

    In case of splitting the columns of one table to n tables with a (unique) subset of the columns of the original table:

    - keep in mind to split according to the assumed usage.

    - maybe even start your exercise going to 3NF and only if needed denormalize

    - since you're talking about a repeating load process, reconsider the whole load process to come to your optimal solution as soon as possible.

    You should post more info on the performance issues you are experiencing today, maybe a split operation can be avoided by optimizing your current physical structures.

    provide the actual ddl + indexes and a typical problem query with the consumption numbers and execution plan (save the graphical execution plan in SSMS and post it)

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • DBTeam (3/26/2010)


    Mainting large table is becoming difficult for us.

    Planning to split this table

    Please suggest us the best way of how can be done this..

    No front end access for this.

    Do i need to go for Normalization

    Yes... but don't do it without a net. Lookup "designing databases [SQL Server], normalization" in Books Online and learn how to make a plan and why. Also, be sure to read the link in that section titled "Clustered Index Design Guidelines" to see how very important all of this is.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply