Redesign help

  • Hi,

    Ive finally been allowed to redesign some tables which are "wide" (Think Field1, Field2, Field3 ... Fieldn). There are multiple fields that do this and columns get added when more Fields are needed.

    To make it even better, Field1 for one project may not contain the same information as Field1 for a different project - there is a "mapping" table that lets people know what the field actually means.

    What im thinking of is to split these columns into a separate table something like the below

    IF OBJECT_ID('TempDB..#Field','U') IS NOT NULL

    DROP TABLE #Field

    CREATE TABLE #Field

    (

    fieldID int NOT NULL,

    pkFromOrigTable int NOT NULL,

    fieldName varchar(200) not null,

    fieldValue decimal(30,20)

    )

    Any advice, suggestions are more than welcome. I want to try and avoid creating something as bad.

  • Hi

    Just my 2 cents but if you have the scope I would suggest normalising this table and splitting all of the related data groups into separate table(s) rather than try and work around an initial design that is flawed..

    Andy

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • Out of curiosity is this database for an application that allows users to create/define custom fields or even data types? The only reason I'm asking is because I've had to support such applications which users love because they can create there own custom forms but it can be a pain on the database side. If that is the case you are on the right path using a name/value pair. Just make sure you index your table properly. Otherwise you'll save yourself many headaches by moving the data to proper tables.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • The users cant create columns or types by themselves but they can request them to be created (and they are). So it sounds very similar to what you have done yb751.

    Please see below for a simplified version of the current table (names have been changed).

    Last lot of changes required adding col21 - col40 due to the fact the users "ran out" of columns.

    As mentioned before im open to any suggestions on a better way of doing this.

    CREATE TABLE [dbo].[#Table1](

    [#Table1ID] [int] NOT NULL,

    [Year] [float] NULL,

    [Period] [int] NULL,

    [Period_2] [int] NULL,

    [Period_3] [int] NULL,

    [Period_4] [int] NULL,

    [Period_5] [int] NULL,

    [Type_1] [float] NULL,

    [Type_2] [float] NULL,

    [Type_3] [float] NULL,

    [Type_4] [float] NULL,

    [Type_5] [float] NULL,

    [Type_6] [float] NULL,

    [Col1] [float] NULL,

    [Col2] [float] NULL,

    [Col3] [float] NULL,

    [Col4] [float] NULL,

    [Col5] [float] NULL,

    [Col6] [float] NULL,

    [Col7] [float] NULL,

    [Col8] [float] NULL,

    [Col9] [float] NULL,

    [Col10] [float] NULL,

    [Col11] [float] NULL,

    [Col12] [float] NULL,

    [Col13] [float] NULL,

    [Col14] [float] NULL,

    [Col15] [float] NULL,

    [Col16] [float] NULL,

    [Col17] [float] NULL,

    [Col18] [float] NULL,

    [Col19] [float] NULL,

    [Col20] [float] NULL,

    [Col21] [float] NULL,

    [Col22] [float] NULL,

    [Col23] [float] NULL,

    [Col24] [float] NULL,

    [Col25] [float] NULL,

    [Col26] [float] NULL,

    [Col27] [float] NULL,

    [Col28] [float] NULL,

    [Col29] [float] NULL,

    [Col30] [float] NULL,

    [Col31] [float] NULL,

    [Col32] [float] NULL,

    [Col33] [float] NULL,

    [Col34] [float] NULL,

    [Col35] [float] NULL,

    [Col36] [float] NULL,

    [Col37] [float] NULL,

    [Col38] [float] NULL,

    [Col39] [float] NULL,

    [Col40] [float] NULL,

    CONSTRAINT [PK_#Table1] PRIMARY KEY CLUSTERED

    (

    [#Table1ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

  • Ok, couple more questions then.

    You have a field called Year but are using a float datatype. I would suggest you use DATETIME types for any fields relating to well dates... Once in that format there is many ways to break it down and get what you need when querying the table. i.e. DATEPART(yyyy, mydatefield) AS year

    Also do you need to use float in col1 to col 40? I'm not saying there is never a need for that datatype but it can be a little tricky. I'm assuming you need decimal places so DECIMAL or NUMERIC might be better to use in your situation. However you obviously know your requirements but here is a little reading on the matter.

    http://msdn.microsoft.com/en-us/library/ms187912%28v=sql.105%29.aspx

    Also even if you create a "generic" table like you proposed make sure it's still contain relevant data to AN object. Don't try to jam every little field a user wants added to one table. As an example one of the said application I used to support allowed users to create forms and define there own fields right in the application. Fields would be added while some would be removed (but not delete from the database for reporting purposes). So to make that happen the database needed to be flexible. However not all custom fields were added to the same table. The database itself contained 3-4 of these types of tables with appropriately grouped data.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • IF OBJECT_ID('TempDB..#Field','U') IS NOT NULL

    DROP TABLE #Field

    CREATE TABLE #Field

    (

    fieldID int NOT NULL,

    pkFromOrigTable int NOT NULL,

    fieldName varchar(200) not null,

    fieldValue decimal(30,20)

    )

    Any advice, suggestions are more than welcome. I want to try and avoid creating something as bad.

    My apologies...based on your sample code above it appears like you already want to convert those float values to decimal. A timestamp field could also be helpful depending on how the table will be used. This is of course if it isn't arleady being captured in another table.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Yeah i have the datatypes sorted out thanks 🙂 Timestamps arent needed. We arent doing anything timebased apart from the "Years" column which is actually more like "Periods" which will always start at 1 and go to N.

    I am interested however on the joining aspects as currently most fields are in one big table (one big table multiple times which means using dynamic sql and passing table names around).

    Im trying to find a solution that will at least alleviate some of the dynamic sql pain and make the code easier to maintain. I can split the col1 ...col40 into smaller subsets and can put them into mulitple tables. Im still a bit unsure if this is the right approach or not. What this would mean though is i might need to join on one table for a project and not for another. Are such dynamic joins recommended at all ? Any other options ?

    Im also using the histogram chart code i found on this site (a few minor tweeks) which takes in one column for the data. The users have said they want to use *any* column as a histogram input. Is there any way apart from dynamic sql that i can accomplish this ?

    Thanks for all the help so far as i do understand im pretty average at design currently.

  • I agree with Andy. What you have now is what I call the 'spreadsheet approach' to data modeling where everything is in one wide table. You said one column can mean different things based on the value of another column in the row (you called it a "project"), i.e. a column's value can represent an entity from a different domain than a value in the same column from another row. That approach violates first normal form.

    I would also decompose the wide, multi-purpose table into many tables that represent the domains of data you are managing. The approach you have devised to replace the 'spreadsheet approach' is known as an EAV and should only be used in rare, niche cases.

    Keeping it Simple: EAV FAIL


    I am interested however on the joining aspects as currently most fields are in one big table (one big table multiple times which means using dynamic sql and passing table names around). Im trying to find a solution that will at least alleviate some of the dynamic sql pain and make the code easier to maintain. I can split the col1 ...col40 into smaller subsets and can put them into mulitple tables. Im still a bit unsure if this is the right approach or not. What this would mean though is i might need to join on one table for a project and not for another. Are such dynamic joins recommended at all ? Any other options ?

    What you are struggling with is the impedance mismatch between object-oriented thinking and relational modeling. In short, model your data per the relational model in the database (3NF is a good place to shoot for to begin) and abstract the impedance mismatch in your application layer, i.e. do not impose on the database to handle the abstraction because frankly it is not very good at it and you'll end up devolving your codebase doing backflips with dynamic SQL, self-joins or scalar UDFs.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • matak (10/18/2012)


    The users cant create columns or types by themselves but they can request them to be created (and they are). So it sounds very similar to what you have done yb751.

    Please see below for a simplified version of the current table (names have been changed).

    Last lot of changes required adding col21 - col40 due to the fact the users "ran out" of columns.

    As mentioned before im open to any suggestions on a better way of doing this.

    CREATE TABLE [dbo].[#Table1](

    [#Table1ID] [int] NOT NULL,

    [Year] [float] NULL,

    [Period] [int] NULL,

    [Period_2] [int] NULL,

    [Period_3] [int] NULL,

    [Period_4] [int] NULL,

    [Period_5] [int] NULL,

    [Type_1] [float] NULL,

    [Type_2] [float] NULL,

    [Type_3] [float] NULL,

    [Type_4] [float] NULL,

    [Type_5] [float] NULL,

    [Type_6] [float] NULL,

    [Col1] [float] NULL,

    [Col2] [float] NULL,

    [Col3] [float] NULL,

    [Col4] [float] NULL,

    [Col5] [float] NULL,

    [Col6] [float] NULL,

    [Col7] [float] NULL,

    [Col8] [float] NULL,

    [Col9] [float] NULL,

    [Col10] [float] NULL,

    [Col11] [float] NULL,

    [Col12] [float] NULL,

    [Col13] [float] NULL,

    [Col14] [float] NULL,

    [Col15] [float] NULL,

    [Col16] [float] NULL,

    [Col17] [float] NULL,

    [Col18] [float] NULL,

    [Col19] [float] NULL,

    [Col20] [float] NULL,

    [Col21] [float] NULL,

    [Col22] [float] NULL,

    [Col23] [float] NULL,

    [Col24] [float] NULL,

    [Col25] [float] NULL,

    [Col26] [float] NULL,

    [Col27] [float] NULL,

    [Col28] [float] NULL,

    [Col29] [float] NULL,

    [Col30] [float] NULL,

    [Col31] [float] NULL,

    [Col32] [float] NULL,

    [Col33] [float] NULL,

    [Col34] [float] NULL,

    [Col35] [float] NULL,

    [Col36] [float] NULL,

    [Col37] [float] NULL,

    [Col38] [float] NULL,

    [Col39] [float] NULL,

    [Col40] [float] NULL,

    CONSTRAINT [PK_#Table1] PRIMARY KEY CLUSTERED

    (

    [#Table1ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    What are all the ColXX colums used for? Depending on the answer to that question, the use of a Name/Value table could be the worst idea or the best.

    Also, why are they all FLOAT?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • OK so yes originally this was based on spreadsheets (before my time) and i dont want to use an EAV but I lack design skills which is why im asking here for any sort of advice.

    The industry im dealing with is mining and no mine is the same. The ColXX are able to be mapped by the user via a mapping table to describe what they are. So for one project Col1 may be Iron grade (iron ore mine) or Gold grade (gold mine) or any other sort of metal you can think of based on mining. Also, each type of mine can have different numbers of ColXX. I would love to be able to use IronGrade, GoldGrade as a column name but since they will never be related im unsure on how to go about doing this without using either dynamic sql or customising code for each mine.

    The table i posted with ColXX is part of the current structure - the new structure will be using different data types. This is one thing I believe I can do after working with the data for the past few weeks 🙂

    you'll end up devolving your codebase doing backflips with dynamic SQL

    The system is currently purely dynamic with columns and tables being passed around 🙁 I would do anything to stop this as its extremely difficult to maintain.

    If anything else needs clarifying please let me know and i will do my best to provide more information.

  • What I think you are after is some guidance creating a data model that supports the industry-data you're trying to support, mining. I tried searching the internet for "standard mining data model" and that combination tends to turn up lots of pages related to "data mining." I did find some ArcGIS standard data models, but not knowing your business I am not sure if those are relevant. At any rate, finding a data model that does not make use of EAV and does not compel you to write dynamic SQL every time you wish to access data for a given project, yet at the same time flexibly supports your line of business may be worth the effort. If you do not find a standard data model you can immediately pickup and use you may at least get some ideas about how others are modeling their mining data.

    As an alternative to EAV you might think about using sparse tables (table per project) or sparse columns (one table for all projects however each column contains data for one data domain) to ease the burden. At least that way you can get back to 1NF and stop re-purposing generically named columns that change meaning depending on the row (i.e. project) being accessed.

    I just found this article earlier in the week and it is the second time I have passed it along:

    And the EAV winner is .... sparse columns by Bob Beauchemin

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • A "Snowflake" structure comes into mind here but I believe this is too large a task to do via a forum. Have you considered bringing in a consultant architect to try to straighten this out? Don't forget that anything that currently touches this table will also need to be reengineered for the new structure.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the advice about sparse tables and columns - ive never used them before so i will do some reading.

  • I have considered a consultant yes and will talk to the appropriate people about getting one in.

    I thought id give the forums a go just in case it was something not too hard (im sure ive mentioned my lack of design skills !! )

  • matak (10/21/2012)


    Thanks for the advice about sparse tables and columns - ive never used them before so i will do some reading.

    You're welcome.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply