Things to consider while creating new table for an application

  • Hi All,

    I am going to design a new table for one functionality which we are going to add in our application.
    So I am going to create new tables . Basically what these table is going to do is, application will load ledger data on a daily basis to these tables
    and create work items for ledger team.

    As im new to table design, i want to know what are all the things to consider for creating performance efficient tables.
    Please share some thoughts.

  • Just concentrate on the basics.

    1) Make sure the column datatypes are the smallest necessary to comfortably fit the data.
    2) Make sure you use date columns for dates, integer type columns for whole numbers that might have math done to them, VARCHAR when data width will vary, and NVARCHAR if you need to store more than one language.
    3) Make sure to use third normal form--basically, avoid repetition of columns in the table
    4) Make sure all your tables have a primary key, in other words, a column that identifies one and only one row of the table
    5) When creating tables that refer to other tables make sure you use the referred to table's primary key, and make sure the key is as physically small as possible. I like using integer identities myself, there's a lot of debate about the best primary keys though!
    6) Put indexes on all primary keys, and I would recommend on all foreign keys (references to other tables) as well.
    7) Don't forget to put constraints on your columns and (where appropriate) table constraints where you need to constrain two or more columns in a single rule.

    The whole point in table design is to not repeat data, to make access as fast as possible, and to allow the *database* to perform as much error prevention as possible. If you do it correctly you relieve the developer of a lot of busy work when it comes time to write the application. As a bonus you'll increase the trust that can be put in the data. After all, if SQL Server is enforcing the rules ALL applications have to follow them automatically.

    Schema design is an art more than a science, but as in all things you can cover 80% of the need with 20% of the tools! 😉

  • By far the most important thing is to a logical data design before doing a physical one. [You can Google "logical data design" for more details.  A physical design is just changing the logical design into SQL tables, indexes, etc, and that is the easy part; even machines can do that part well :-).]

    That is, don't start out with "tables", "indexes", etc., but just the data itself first.  Follow the normalizations steps, in order, step by step.  That process will yield vastly better results than jumping straight into a table.

    As you haven't done this before, that process could take quite a few days.  Stick with it, though, as it's the proper way to do this and it gets easier with practice.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thanks ScottPletcher,  Roger.plowman  !!!
    I will follow your suggestions and hope it will be good learning experience for me 🙂

  • Hi ,
    Based upon the advise provided , I have designed the new tables.
    Basically my application is going to get two feed daily for ledger purpose. One which contains open items and the other is going to contain closed items. 
    So using the closed items table, First I will process the data and close all the open items based on the Match found. Later I will process the open items table and create new workitems . The Master assignment table is lookup table. This will contain Reconciler name whom the workitem needs to be automatically assigned to based upon unique Suspense Key. So once the  new workitems are created , I will make a lookup to this table to assign the workitems.

    Below are the tables and the relation ship diagram. I have created foreign key relation ship from my workitem table to master lookup but not sure whether its needed or not . Please advise if the table structure which i have created needs any changes for performance improvement.


    CREATE TABLE [dbo].[OpenItemStaging](
        [CompanyCode] [varchar](30) NULL,
        [JournalID] [varchar](30) NULL,
        [EffectiveDate] [date] NULL,
        [Sequence] [int] NULL,
        [LineID] [int] NULL,
        [Ledger] [varchar](30) NULL,
        [AccountNumber] [char](7) NULL,
        [LineDate] [date] NULL,
        [Amount] [money] NULL,
        [CurrencyCode] [nchar](3) NULL,
        [CostCenter] [nchar](7) NULL,
        [SuspenseKey] [nchar](7) NOT NULL,
        [SuspenseIndicator] [nchar](1) NULL,
        [SuspenseCloseDate] [date] NULL,
        [Source] [nchar](3) NULL,
        [Operator] [varchar](50) NULL,
        [Line Description] [varchar](50) NULL
    ) ON [PRIMARY]

    CREATE TABLE [dbo].[ClosedItemStaging](

        [CompanyCode] [varchar](30) NULL,
        [JournalID] [varchar](30) NULL,
        [EffectiveDate] [date] NULL,
        [Sequence] [int] NULL,
        [LineID] [int] NULL,
        [Ledger] [varchar](30) NULL,
        [AccountNumber] [char](7) NULL,
        [LineDate] [date] NULL,
        [Amount] [money] NULL,
        [CurrencyCode] [nchar](3) NULL,
        [CostCenter] [nchar](7) NULL,
        [SuspenseKey] [nchar](7) NOT NULL,
        [SuspenseIndicator] [nchar](1) NULL,
        [SuspenseCloseDate] [date] NULL,
        [Source] [nchar](3) NULL,
        [Operator] [varchar](50) NULL,
        [Line Description] [varchar](50) NULL
    ) ON [PRIMARY]

    CREATE TABLE [dbo].[TPAOpenWorkItem](
        [WorkitemID] [int] NOT NULL,
        [SuspenseKey] [nchar](7) NULL,
        [Account] [varchar](50) NULL,
        [TPA_FirstName] [varchar](50) NULL,
        [TPA_LastName] [varchar](50) NULL,
        [Reconciler_FirstName] [varchar](50) NULL,
        [Reconciler_LastName] [varchar](50) NULL,
        [JournalID] [nvarchar](12) NULL,
        [Age] [int] NULL,
        [Severity] [varchar](10) NULL,
        [Amount] [money] NULL,
        [Issue] [int] NULL,
    CONSTRAINT [PK_WorkItem] PRIMARY KEY CLUSTERED
    (
        [WorkitemID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    CREATE TABLE [dbo].[MasterAssignment](
        [SuspenseKey] [nchar](7) NOT NULL,
        [Reconciler_FirstName] [varchar](15) NULL,
        [Reconciler_LastName] [varchar](15) NULL,
        [TPA_FirstName] [varchar](15) NULL,
        [TPA_LastName] [varchar](15) NULL,
        [AccountName] [varchar](50) NULL,
    CONSTRAINT [PK_MasterAssignment] PRIMARY KEY CLUSTERED
    (
        [SuspenseKey] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    GO

  • You seem to be missing some Primary Keys from some of the tables.  While an actual primary key isn't always uber important, clustered indexes usually are.  Also, MS like for keys to be unique to prevent the needs for a "uniquefier" to be present for rows that do have duplicate keys.  Further, an INT (for use as a surrogate key when no other key is reasonable or possible) is a lot more narrow that the underlying RID that SQL Server uses.  Narrow keys not only take less space, but they allow for more rows per page which means that you don't use as much memory and your queries will usually run a bit faster.  I say "usually" because there are exceptions to every rule.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, June 4, 2017 1:15 PM

    You seem to be missing some Primary Keys from some of the tables.  While an actual primary key isn't always uber important, clustered indexes usually are.  Also, MS like for keys to be unique to prevent the needs for a "uniquefier" to be present for rows that do have duplicate keys.  Further, and INT (for use as a surrogate key when no other key is reasonable or possible) is a lot more narrow that the underlying RID that SQL Server uses.  Narrow keys not only take less space, but they allow for more rows per page which means that you don't use as much memory and your queries will usually run a bit faster.  I say "usually" because there are exceptions to every rule.

    Thanks Jeff !!!

    I will re-design the table with appropriate Primary keys and post here shortly .

  • Jackson89 - Sunday, June 4, 2017 4:21 PM

    Jeff Moden - Sunday, June 4, 2017 1:15 PM

    You seem to be missing some Primary Keys from some of the tables.  While an actual primary key isn't always uber important, clustered indexes usually are.  Also, MS like for keys to be unique to prevent the needs for a "uniquefier" to be present for rows that do have duplicate keys.  Further, and INT (for use as a surrogate key when no other key is reasonable or possible) is a lot more narrow that the underlying RID that SQL Server uses.  Narrow keys not only take less space, but they allow for more rows per page which means that you don't use as much memory and your queries will usually run a bit faster.  I say "usually" because there are exceptions to every rule.

    Thanks Jeff !!!

    I will re-design the table with appropriate Primary keys and post here shortly .

    While you're at it, also consider the use of appropriate Foreign Keys to seriously help with data quality/accuracy and to make the relationships (think ERD) between tables somewhat self-documenting so that you don't have to rely heavily on sometimes inaccurate "tribal knowledge" to get a job done.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, June 4, 2017 7:03 PM

    While you're at it, also consider the use of appropriate Foreign Keys to seriously help with data quality/accuracy and to make the relationships (think ERD) between tables somewhat self-documenting so that you don't have to rely heavily on sometimes inaccurate "tribal knowledge" to get a job done.

    Sure Jeff !!!
    I will add foreign keys to create meaningful relationship between tables. Thanks !!!

  • I have very little time right now, but I want to point out at least a few things.

    All tables: Verify the tables are in 2NF and 3NF.

    Other things to consider / re-consider.

    Encode these columns (use an integer code rather than varchar):
    CompanyCode ()
    JournalID
    Ledger
    Operator
    Encode TPA_FirstName and TPA_LastName into a TPA_Name_Id from which you can lookup both those values (and more).
    Encode Reconciler_FirstName and Reconciler_LastName into a Reconciler_Name_Id from which you can lookup both those values (and more).
    ...

    Does these columns really need nchar (dbcs)? ::
    CurrencyCode
    CostCenter
    SuspenseKey
    SuspenseIndicator
    Source
    ...

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Monday, June 5, 2017 11:47 AM

    I have very little time right now, but I want to point out at least a few things.

    All tables: Verify the tables are in 2NF and 3NF.

    Other things to consider / re-consider.

    Encode these columns (use an integer code rather than varchar):
    CompanyCode ()
    JournalID
    Ledger
    Operator
    Encode TPA_FirstName and TPA_LastName into a TPA_Name_Id from which you can lookup both those values (and more).
    Encode Reconciler_FirstName and Reconciler_LastName into a Reconciler_Name_Id from which you can lookup both those values (and more).
    ...

    Does these columns really need nchar (dbcs)? ::
    CurrencyCode
    CostCenter
    SuspenseKey
    SuspenseIndicator
    Source
    ...

    Hi Scott,

    Thanks for the advise !!!
     I am looking into the suggestions which you have provided, I will post the changes shortly ..

  • Hi ,

    I am in the process of designing the new tables as per the suggestion provided, I have one doubt... is it essential to have keys for staging tables ?
    As i said before I am going to have two tables one for storing open ledger items for the day and the other one to store closed items for the day. These two tables will get loaded daily from other source system.
    Its going to be a daily truncate and load. That's why I got this doubt .. please advise..

  • For a temporary "staging" table, where data just waits to be loaded into permanent table(s), no.

    But all permanent tables should have at least one candidate key -- i.e. column(s) that identify each row in the table, even if you don't ever formally create an actual key or index based on those columns -- so that you can normalize the table.  Most experts recommend going to at least 3rd normal form, as it's called (or, better yet, to BCNF, Boyce-Code normal form, often jokingly called 3 1/2 NF).  In order to get to 2NF or 3NF, though, the table will have to have some kind of key column(s).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thanks for the clarification Scott !!

  • Is there a reason you have separate "open" and "closed" tables? They appear to be identical, with the only logical difference being a "status" of the item contained ("open" or "closed" I guess). In other words, should you really have a single table with a "status" column and store both open and closed items in one table?

    I'm not saying that what you have is wrong, I'm just saying it's something that makes me ask the question, and perhaps you should document the reason why you went with separate tables if there's a reason for it. That'll help someone who later inherits your system to understand why it's designed the way it is. I can think of valid reasons for separate tables, but I can only assume why you designed it that way. Don't make me assume, please, that's a bad idea. 🙂

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply