I need help designing a Database

  • I have a project that consists of members and categories. There's a main category, that has associated sub-categories. The subcategories will have fields based on the main category. For instance, if the main category is shoes, the sub-cat might have size,color,brand; but if the main category is bikes, the sub-cat might have make,height,type,color.

    The members fill out the categories based off their selections.

    I tried creating a members, maincat & subcat tables, but I couldn't adjust for the varying fields. I then thought I'd make a table for each main category type, but I couldn't figure out how to store the member selections or even present the fields dynamically.

    I'm hoping someone can provide me with a solid database structure that I can use, and build this project off of, or at least some ideas to get me going.

    Thanks

    Pete

  • It sounds like you want to make a database to store class instances or did I get you wrong?

    Is this your first attempt of designing a database?

  • Designing a database like this, yes.

    I think my big hangup is with how all the maincategories have different fields for answers. I just don't know how to relate it all together.

  • Well, you should sit down and start identifying your entities. Without any additional information it is hard to know exactly the problem. But then again, specifying the problem is a key for solving the problem

  • Pete,

    I agree with the previous poster that more information would be helpful.  However, here's my crack at it, with the information in hand:

    Members table: includes whatever information you intend to store about members

    Categories table: parent categories

    Subcategories table: secondary classification

    This table would include a FK that associates each item with the parent category, a Type field to identify the value type (size, color, brand, etc.) and a Value field including the actual value

    hth

    Tim

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

  • Adding on to what Tim said, I would leave the FK out of the Subcategories table. Otherwise, you'll have to add a Size subcategory for every category that has a size option. Or in the example, you gave, bikes and shoes both have a color subcategory, so you don't want two subcategories for color, right?

    Instead, you'll want a 4th mapping table that lists mapping of categories to subcategories.

    For example, your Category table will have two distinct entries for bikes and shoes. Then the subcategories table would have a distinct list of all subcategories including only 1 entry for color. Then the mapping table would have an entry for each bikes and shoes mapped to color.

    Example:

    Categories:

    CatID, Category

    1, Shoes

    2, Bikes

     

    Subcategories:

    SubcatID, Subcategory

    1, Size

    2, color

    3, brand

    4, make

    5, height

    6, type

     

    CategoryToSubCategory:

    CatID, SubCatID

    1, 1

    1, 2

    1, 3

    2, 2

    2, 4

    2, 5

    2, 6

     

    The query to get a list of all Categories and subcategories would be something like:

    Select C.*, S.*

    From Categories C

    Inner Join CategoryToSubCategory CS On C.CatID = CS.CatID

    Inner Join Subcategories S on S.SubCatID = CS.SubCatID

     

     

    Make sense?


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • Without additional information and knowing that business people will always ask for something more as soon as you get going on your project, I'd create a category table that references itself to contain all you categories and sub categories.  I'd then create a type table (what you call fields).  I'd create a junction table to facilitate the potential many-to-many relationship between categories and types, finally I'd create a TypeValue table which would contain the values or responses associated with each type/field and a foreignkey which relates back to the junction table.  For example:

     

    CREATE TABLE Categories

    (ID UNIQUEIDENTIFIER PRIMARY KEY NOT NULL DEFAULT(newid()),

    Category VARCHAR(128) NOT NULL,

    ParentCategoryID UNIQUEIDENTIFIER NULL)

    ParentCategoryID is a foreign key to the Categories table.

    CREATE TABLE Types

    (ID UNIQUEIDENTIFIER PRIMARY KEY NOT NULL DEFAULT(newid()),

    Type VARCHAR(128) NOT NULL)

    CREATE TABLE CategoryTypes

    (ID UNIQUEIDENTIFIER PRIMARY KEY NOT NULL DEFAULT(newid()),

    CategoryID UNIQUEIDENTIFIER NOT NULL,

    TypeID UNIQUEIDENTIFIER NOT NULL)

    CategoryID is a foreign key to the Categories table.

    TypeID is a foreign key to the TypeTable.

    CREATE TABLE TypeValue

    (ID UNIQUEIDENTIFIER PRIMARY KEY NOT NULL DEFAULT(newid()),

    Value VARCHAR(40) NOT NULL,

    CategoryTypeID UNIQUEIDENTIFIER NOT NULL)

    CategoryTypeID is a foreign key to the CategoryTypes table.

    The advantage of this design is that you can manipulate the category tree as well as the associated types with complete freedom.  You can make the tree as flat as you want or as deep as you want with no limitations.  The disadvantage of this design is that you must convert all of the values for the types into character data and that bodes badly for performance later on down the road.  To aleviate some of the performance concerns you could change the design slightly to include additional data types at the expense of disk space which often is very cheap.  You might do the following:

    CREATE TABLE Types

    (ID UNIQUEIDENTIFIER PRIMARY KEY NOT NULL DEFAULT(newid()),

    Type VARCHAR(128) NOT NULL,

    DataType INTEGER NOT NULL)

    DataType is a number from 1 to 4 and simply determines which column in the TypeTable the value is stored.

    CREATE TABLE TypeValue

    (ID UNIQUEIDENTIFIER PRIMARY KEY NOT NULL DEFAULT(newid()),

    CharacterValue VARCHAR(40) NULL,

    IntegerValue VARCHAR(40) NULL,

    MoneyValue VARCHAR(40) NULL,

    DateValue VARCHAR(40) NULL,

    CategoryTypeID UNIQUEIDENTIFIER NOT NULL)

    The disadvantage to this design is the application searching the database will have to be smart enough to know that the DataType column in the Types table will be the key to knowing which column in the TypeValue table to search.

    I hope I've explained the idea well.

  • I only have one problem with this suggestion, why are you using uniqueidentifiers for your primary keys? This is bad, and here's why:

    1. A uniqueidentifier is a lot larger than an int (such as an identity field) and every index will be larger and slower.

    2. Uniqueidentifiers are not created in sequential order. So every time a new record is inserted, the table will have to be resorted making inserts slow and resource consuming.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • The purpose of the uniqueidentifier is not to allow sorting but to allow records to be easily identified regardless of which server they are created on.  Uniqueidentifiers also provide excellent selectivity for indexing records just as an identitify column would.

    I personally use uniqueidentifiers because of experience.  Most systems I've developed eventually grow to the point where the customer starts considering using multiple servers and some form of replication.  I've dealt with systems designed using identity columns as the primary key and the expense of having to deal with the pit falls of dealing with indentity columns as primary keys in a multi-server environement far outweigh the advantages.  As for the size of uniqueidentifiers, disk space is cheap and I don't consider 12 bytes (16 bytes for a uniqueidentifier instead of 4 bytes for an int) to be a significant cost.  In addition most systems required the database server to perform updates and inserts a very small percentage of the time, usually less than 10 percent so index maintenance isn't an issue.

    If you have a problem with uniqueidentifiers then you should consider using integers instead.  But if you don't understand the advantages of uniqueidentifiers then you probably shouldn't even be considering my design in the first place.  My design sacrifices simplicity of design for scalability and flexibility it is also more maintainable once scaled out than some other designs.  On a small scale my design isn't going to win a speed race with a design optimized for speed but once you start scaling out the system my design will perform just as well as the optimized design and won't cost you thousands of man hours to scale out.

    Ask yourself this question.  When I insert row number 2,147,483,647 what am I going to do?  I suspect your answer will be that you are going to go back and do some maintenance work on your system.  I attempt to avoid building job security into my systems.

  • Sounds good to say that there is a insignificant difference between 4 bytes and 16 bytes, but when you get to 2,147,483,647 records, the size difference is 8,589,934,588 to 34,359,738,352 or a difference of 25,769,803,764 which is a difference of almost 24 GB.

    The fact is, the bigger your system gets and the more scaled out it gets, the slower it gets and the more maintenance it will require. The sizes of the indexes increase exponentially. The amount of file fragmentation increases exponentially as the system has to perform large data sort operations with every insert creating ever increasing page splits.

    SQL Server 2005 did add a new function that would help your system greatly. If you used a default of NewSequentialID(), the ID's would at least be generated in order to limit resorting.

    >> When I insert row number 2,147,483,647 what am I going to do?

    Ever hear of a bigint data type?

     


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • Ever hear of a bigint data type?

    You mean the 8 byte integer?  The one that is only 8 bytes smaller than the uniqueidentifer instead of 12 bytes smaller?  Sure I've heard of that data type and it even saves me less space yet has the same issues as the integer data type.  All of a sudden your 24GB difference has reduced itself to 18GB and what are you going to do in a multi-server environement?  Add another column to your table to help identify where the record originated (oops there goes another 4 or 5GB of space)?  Or are you going to use an odd/even numbering scheme?  What happens when you need to add a 3'rd server and that odd/even scheme just won't work anymore?  You are designing in job security that's about all you are accomplishing.

     

    The amount of file fragmentation increases exponentially as the system has to perform large data sort operations with every insert creating ever increasing page splits.

    Ever hear of fill factor?

     

    I never assumed that the table would grow to 2,147,483,647 records in fact I assumed it would not grow to that size, however, each time a record is added your identity column is incremented.  Some records I would expect to eventually be deleted so you'd begin reusing identity values and that's a huge amount of work and far more complex than simply using GUIDs.  You aren't thinking about that.  I also don't see 24GB as being significant as I work with databases which are terabytes in size, the cost of the space is far less than the cost of a DBA's man hours required to deal with the inherent issues.  Is the maintenance effort of a DBA really going to cost less than a $300 hard disk?  No.  I believe your approach is "penny wise, pound foolish" and that's why my design philosophy evolved to its present state.

    You seem passionate about you opinion of GUIDs and that's fine but I think the misinformation about indexes and their performance is over the top.  If you don't like GUIDs then don't use them but don't claim that index maintenance is going to be a catastophic issue should one use GUID's.

    I do have a nagging question though. What is all this sorting you are claiming is going to happen in the database? Are you claiming that inserting sorted data into a B-tree is going to cause fewer splits than if you inserted data that was in a random order?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply