DB Design and Data Retrieval from a heavy table

  • I have a requirement to have 612 columns in my database table. The # of columns as per data type are:

    BigInt – 150 (PositionCol1, PositionCol2…………PositionCol150)

    Int - 5

    SmallInt – 5

    Date – 150 (SourceDateCol1, SourceDate2,………….SourceDate150)

    DateTime – 2

    Varchar(2000) – 150 (FormulaCol1, FormulaCol2………………FormulaCol150)

    Bit – 150 (IsActive1, IsActive2,……………….IsActive150)

    When user does the import for first time the data gets stored in PositionCol1, SourceDateCol1, FormulaCol1, IsActiveCol1, etc. (other datetime, Int, Smallint columns).

    When user does the import for second time the data gets stored in PositionCol2, SourceDateCol2, FormulaCol2, IsActiveCol2, etc. (other datetime, Int, Smallint columns)….. so and so on.

    There is a ProjectID column in the table for which data is being imported.

    Before starting the import process, user maps the excel column names with the database column names (PositionCol1, SourceDateCol1, FormulaCol1, IsActiveCol1) and this mapping get stored in a separate table; so that when retrieved data can be shown under these mapping column names instead of DB column names. E.g.

    PositionCol1 may be mapped to SAPDATA

    SourceDateCol1 may be mapped to SAPDATE

    FormulaCol1 may be mapped to SAPFORMULA

    IsActiveCol1 may be mapped to SAPISACTIVE

    40,000 rows will be added in this table every day, my questions is that will the SQL be able to handle the load of that much of data in the long run?

    Most of the times, a row will have data in about 200-300 columns; in the worst case it’ll have data in all of the 612 columns. Keeping in view this point, shall I make some changes in the design to avoid any future performance issues? If so, please suggest what could be done?

    If I stick to my current design, what points I should take care of, apart from Indexing, to have optimal performance while retrieving the data from this huge table?

    If I need to retrieve data of a particular entity e.g. SAPDATA, I’ll have to go to my mapping table, get the database column name against SAPDATA i.e. PositionCol1 in this case; and retrieve it. But, in that way, I’ll have to write dynamic queries. Is there any other better way?

  • 40,000 is not a big number for 4 columns import, shouldn't be the issue

    Eventhough lots of page split will happen and indexes on this tables will be fragmented very soon. And eventually you will face performance issue in Import and Select operations

    In better design, you can have a table with 4 columns + 1 SrNo

    i.e. PositionCol, SourceDateCol, FormulaCol, IsActiveCol, ImportSrNo

    Each time you import the data you can increment ImportNo value (first time 1 then 2,3,4,... 612)

  • I have to agree with what was said above, you need to normalize this data.

    BigInt – 150 (PositionCol1, PositionCol2…………PositionCol150)

    Int - 5

    SmallInt – 5

    Date – 150 (SourceDateCol1, SourceDate2,………….SourceDate150)

    DateTime – 2

    Varchar(2000) – 150 (FormulaCol1, FormulaCol2………………FormulaCol150)

    Bit – 150 (IsActive1, IsActive2,……………….IsActive150)

    I don't have enough information to make a full recommendation, but based n the fact that each of the bolded elements above have 150 repeating columns, those four columns should be in their own table with an identifier to indicate which "column" they are in. Basically, pivoting (or actually unpivoting) the data. One row of data for each set of columns.

    This table would be linked to the master table which would have your nonrepeating columns.

    This also makes sense from a maintenance perspective. What happens if they come back later and tell you they need another 50 columns each? Your current design will require changes to the table structure while the normalized version just means adding more rows to the table with no changes to the structure.

    Plus, looking at one of the columns, you are going to have a lot of data off row. You have 150 columns of varchar(2000). Not going to be good for performance.

    We probably could use more details about what your are doing to give you better answers.

  • Thanks for your valuable comments and Apologies for creating confusion.

    The aforesaid is a design that I’m considering, my current design and business logic is as following:

    There can be N number of clients. CLT_Clients table stores client information.

    There can be N number of projects associated to a client. PRJ_Projects table stores project information and a link to the client.

    There can be N number of listings associated to a project. PRJ_Listings table stores listing information and a link to the project.

    There can be N number of source entities associated to a listing. ST_Entities table stores source entity information and a link to the listing. (Note: if I go with 612 column design, then it has to be restricted to 150 columns)

    This source entity is the actual import that contains the InvestorID, position values, source date, active and formula status.

    The name of the import e.g. L1Entity1 is stored in ST_Entities table alongwith ID field i.e. EntityID

    InvestorID, Position, Source Date, Active and Formula values get stored in ST_Positions table

    Database Diagram

    DB Design

    Data need to be view as following:

    With this design I’m able to handle N number of imports because the Position, Source Date, IsActive, Formula columns get Pivoted.

    The problem that I’m facing with this design is that the system performs very slow when it has to select data for more than 10-12 source entities, and the requirement is to show about 150 source entities. Because data is not stored row wise and I need to show it column wise, hence dynamic queries are written to pivot these columns which takes long.

    Please comment/suggest on my current database design if it’s correct or need to be changed with the new design that I explained in the first post?

    If I need to stick to my current, what can be done to improve the performance?

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply