Database design theory question

  • Hey all,

    Got a question regarding database design - specifically, regarding table design.

    I've recently joined a company who has a project involving the re-design of an application to manage their projects. After sifting through the application and database, I've concluded that the application is, in effect, the management of a giant set of key-value pairs.

    Currently, the database is set up in a conventional format - that is to say, tables with a set of columns representing their properties, and each row representing an instance of that element, with the values for the properties assigned to the columns. For the sake of this discussion, let's say we have a parent-child relationship set up, with four tables.

    Each of these four tables has a set of columns for their properties, and a key pointing to their parent element. One such table has around 50 columns, the others around 20 or so.

    One key element of this application though, is that there would regularly be new properties required for the tables. In the old version of this application, this entailed adding new columns to the tables, then updating the procedures and application code to reflect these new columns.

    The new version of the application is supposed to be designed in a way that would allow adding these properties in a more streamlined fashion. Unfortunately, somewhere along the design process, this concept was lost, and so the tables ended up with the same type of design structure as the old version.

    This is where I come in. One of my mandates is to set the train straight, and try to fix this mistake. So, I've set up a new design structure, and I'd like some comments / critiques.

    Essentially what I've done is take their four-table, multi-column structure, and reduce it down to two tables. One table, which I'll call ObjectDefinition, contains a record for each property and each table. So, for example, there is a record in that table with the value "Table A". Then, there is a set of records representing Table A's properties, each of which has a ParentID property pointing back to the "Table A" record. This table also stores the details of what the type of the property is (EG: INT, VARCHAR, etc...).

    The second table, called ObjectValue, contains the actual values for these properties. So, for a single record in the original Table A, in the new structure, there is N + 1 records, where N represents the number of properties that Table A had.

    To me, this design made more sense, because if we needed new properties for Table A, all we needed to do was add a new record to the ObjectDefinition table, and then we could add or edit those records corresponding to the actual values of Table A, in the ObjectValue table.

    Without further ado, here's the specifications for my two tables:

    CREATE TABLE [dbo].[ObjectDefinition](

    [RowPointer] [uniqueidentifier] NOT NULL,

    [ParentRowPointer] [hierarchyid] NOT NULL,

    [Description] [varchar](50) NOT NULL,

    [SystemTypeID] [smallint] NOT NULL,

    [MaxLength] [smallint] NOT NULL,

    [Precision] [tinyint] NOT NULL,

    [Scale] [tinyint] NOT NULL,

    [IsParent] [bit] NOT NULL,

    [DateCreated] [datetime] NOT NULL,

    [DateUpdated] [datetime] NULL,

    [DateDeleted] [datetime] NULL,

    [CreatedBy] [uniqueidentifier] NOT NULL,

    [UpdatedBy] [uniqueidentifier] NULL,

    [DeletedBy] [uniqueidentifier] NULL,

    CONSTRAINT [PK_ObjectDefinition] PRIMARY KEY CLUSTERED

    (

    [RowPointer] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    ALTER TABLE [dbo].[ObjectDefinition] ADD CONSTRAINT [DF_ObjectDefinition_RowPointer] DEFAULT (newsequentialid()) FOR [RowPointer]

    GO

    ALTER TABLE [dbo].[ObjectDefinition] ADD CONSTRAINT [DF_ObjectDefinition_DateCreated] DEFAULT (getdate()) FOR [DateCreated]

    GO

    CREATE TABLE [dbo].[ObjectValue](

    [RowPointer] [uniqueidentifier] NOT NULL,

    [ParentRowPointer] [hierarchyid] NOT NULL,

    [ObjectDefinitionRowPointer] [uniqueidentifier] NOT NULL,

    [Value] [varchar](800) NULL,

    [DateCreated] [datetime] NOT NULL,

    [DateUpdated] [datetime] NULL,

    [DateDeleted] [datetime] NULL,

    [CreatedBy] [uniqueidentifier] NOT NULL,

    [UpdatedBy] [uniqueidentifier] NULL,

    [DeletedBy] [uniqueidentifier] NULL,

    CONSTRAINT [PK_ObjectValue] PRIMARY KEY CLUSTERED

    (

    [RowPointer] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    ALTER TABLE [dbo].[ObjectValue] ADD CONSTRAINT [DF_ObjectValue_RowPointer] DEFAULT (newsequentialid()) FOR [RowPointer]

    GO

    ALTER TABLE [dbo].[ObjectValue] ADD CONSTRAINT [DF_ObjectValue_DateCreated] DEFAULT (getdate()) FOR [DateCreated]

    GO

    For the purposes of this discussion, let's ignore the datecreated/updated/deleted and created/update/deletedby columns. Also, the decision to use UNIQUEIDENTIFIER instead of IDENTITY was made by the company, so I can't control that.

    The reason for my concern about all of this, is a query such as the following:

    SELECT

    ov1.RowPointer,

    [Description],

    ov2.Value,

    SystemTypeID,

    MaxLength,

    [Precision],

    Scale

    FROM ObjectValue ov1

    JOIN ObjectValue ov2 ON ov2.ParentRowPointer.GetAncestor(1) = ov1.ParentRowPointer

    JOIN ObjectDefinition od ON od.RowPointer = ov2.ObjectDefinitionRowPointer

    AND od.IsParent = 0

    WHEREov1.ObjectDefinitionRowPointer = @ObjectDefinitionRowPointer

    ORDER BY ov1.ParentRowPointer

    In the old system, this query would return about 20000 records. Each record had about 50 columns. In the new system, the query returns about 1.4 million records. Problem is, the old version would run instantly, whereas the new version takes about 1.5 minutes to run. This isn't acceptable.

    I'm hoping someone can provide some insight for me, because I really think the new version is the right way to go - but we can't have it take this long for a simple query like this :/

    For reference, there is an index on ObjectValue :

    CREATE NONCLUSTERED INDEX [IND_ObjectDefinitionRowPointer] ON [dbo].[ObjectValue]

    (

    [ObjectDefinitionRowPointer] ASC

    )

    INCLUDE ( [ParentRowPointer],

    [Value]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    So the above query should be properly optimized already ...

  • Sounds like an EAV design. Based on that you should search on EAV or Entity Attribute Value and see what else you and find. I haven't used EAV tables personally but they have their uses.

  • YOUCH!!!! You might check your indexes. You have guids on these tables as the clustered index. Unless you have a process that defrags them daily I suspect your fragmentation is going to be right around 95%+ on a regular basis. Add to this the millions of rows and you have a recipe for horribly slow queries.

    I agree with Lynn that this sounds like an EAV. This design does have its place but you better be darn sure you know what you are doing or you are going to end up with a nightmare.

    Take a look at this story about somebody who went down this path. https://www.simple-talk.com/opinion/opinion-pieces/bad-carma/[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • One other comment about your design. I would highly recommend you not use such generic primary key names as RowPointer and have that be the same name in each table. A column should have a purpose and NOT change names between tables. At the very least I would change these to ObjectID and ObjectValueID. That way the name stays consistent and you don't have columns like ParentRowPointer which really means RowPointer in another table, but I can't use that name because it is already a column in this table. :w00t:

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I fully agree with your points Sean, unfortunately, company constraints mean those are the names for the columns. Got no control over that.

    Regarding the GUIDs part, again, don't really have much control there. Though, by making it NewSequentialID(), I was under the impression that the fragmentation wouldn't be a big issue?

    @Lynn, you are exactly correct, this is an EAV design. Hadn't realised it had it's own subculture already 🙂 Been reading up some on this, and I realise that most every one else has arrived at the same conclusions that I have.

    Unfortunately, I really don't see any way around this. I'm going to explore the XML option, since people have suggested this might be a better way of modeling it. Any comments on that?

  • kramaswamy (4/4/2014)


    I fully agree with your points Sean, unfortunately, company constraints mean those are the names for the columns. Got no control over that.

    Regarding the GUIDs part, again, don't really have much control there. Though, by making it NewSequentialID(), I was under the impression that the fragmentation wouldn't be a big issue?

    @Lynn, you are exactly correct, this is an EAV design. Hadn't realised it had it's own subculture already 🙂 Been reading up some on this, and I realise that most every one else has arrived at the same conclusions that I have.

    Unfortunately, I really don't see any way around this. I'm going to explore the XML option, since people have suggested this might be a better way of modeling it. Any comments on that?

    I know what you mean about being stuck. It is true that the fragmentation will be minimal using NewSequentialID but honestly you can manage it by defragging regularly. There are a number of ways to handle the fragmentation issue.

    I have used a combination of fixed properties and dynamic ones through EAV with pretty good success. I have also seen some people used a mixed mode with XML as the dynamic elements quite successfully too. I think that in the long the mixed mode style will be workable. Otherwise you end up like the article I posted earlier, which is EAV gone horribly wrong.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Indeed. That's the option I'm exploring now - instead of just two tables, I'm going with the original design of the four tables, but having each table have its own related table.

    Essentially, the table structure becomes:

    Table A

    a_ID INT IDENTITY PRIMARY KEY,

    a_RequiredProperties (set of columns which will be used for filters),

    a_OtherProperties XML

    Table A Properties

    ap_ID INT IDENTITY PRIMARY KEY,

    ap_UserTypeID SMALLINT,

    ap_MaxLength SMALLINT,

    ap_Precision TINYINT,

    ap_Scale TINYINT

    and repeat for the other tables. The values which will go into the OtherProperties column will be all the values which are being used for display purposes, and not for query purposes. These will still need to be updated, but they won't be used for queries.

  • kramaswamy (4/4/2014)


    Indeed. That's the option I'm exploring now - instead of just two tables, I'm going with the original design of the four tables, but having each table have its own related table.

    Essentially, the table structure becomes:

    Table A

    a_ID INT IDENTITY PRIMARY KEY,

    a_RequiredProperties (set of columns which will be used for filters),

    a_OtherProperties XML

    Table A Properties

    ap_ID INT IDENTITY PRIMARY KEY,

    ap_UserTypeID SMALLINT,

    ap_MaxLength SMALLINT,

    ap_Precision TINYINT,

    ap_Scale TINYINT

    and repeat for the other tables. The values which will go into the OtherProperties column will be all the values which are being used for display purposes, and not for query purposes. These will still need to be updated, but they won't be used for queries.

    That sounds like the right blend of usable and dynamic to me. 😀

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply