Table Structure of Monthly Data

  • I'm curious what everyone thinks is the best way to store data where the history of aggregated data for an ID is stored all in one table.  Our current way of doing so is something like this create table ID_Dates
    (
    ID int null
    ,[2017_01] int null
    ,[2017_02] int null
    ,[2017_03] int null
    ,[2017_04] int null
    ,[2017_05] int null
    )

    insert into ID_Dates
    values (2324, 1, 0, 4, 0, 1)
    ,(3593, 56, 3, 7, 21, 8)
    ,(5497, 0, 0, 0, 34, 0)
    ,(1568, 34, 5, 6, 22, 1)
    ,(9841, 6, 11, 4, 0, 34)
    ,(8737, 0, 99, 8, 3, 0)
    ,(4391, 2, 5, 18, 0, 3)

    where each ID is unique and there is a column for each year/month combo containing the sum or count of the aggregate data.  In this case a new column is added for every month that needs reported on so the table can grow quite wide if you need history back to 2000.

    I thought about a couple different options but I'm not sure if they are better as far as performance and maintainability go.

    Option 1 :create table ID_Dates2
    (
    ID int null
    ,[Year] int null
    ,[Month] int null
    ,Counts int null
    )

    insert into ID_Dates2
    values (2324, 1, 2017, 1)
    ,(2324, 2, 2017, 0)
    ,(2324, 3, 2017, 4)
    ,(2324, 4, 2017, 0)
    ,(2324, 5, 2017, 1)
    ,(3593, 1, 2017, 56)
    ,(3593, 2, 2017, 3)
    ,(3593, 3, 2017, 7)
    ,(3593, 4, 2017, 21)
    ,(3593, 5, 2017, 8)
    ,(5497, 1, 2017, 0)
    ,(5497, 2, 2017, 0)
    ,(5497, 3, 2017, 0)
    ,(5497, 4, 2017, 34)
    ,(5497, 5, 2017, 0)
    ,(1568, 1, 2017, 34)
    ,(1568, 2, 2017, 5)
    ,(1568, 3, 2017, 6)
    ,(1568, 4, 2017, 22)
    ,(1568, 5, 2017, 1)
    ,(9841, 1, 2017, 6)
    ,(9841, 2, 2017, 11)
    ,(9841, 3, 2017, 4)
    ,(9841, 4, 2017, 0)
    ,(9841, 5, 2017, 34)
    ,(8737, 1, 2017, 0)
    ,(8737, 2, 2017, 99)
    ,(8737, 3, 2017, 8)
    ,(8737, 4, 2017, 3)
    ,(8737, 5, 2017, 0)
    ,(4391, 1, 2017, 2)
    ,(4391, 2, 2017, 5)
    ,(4391, 3, 2017, 18)
    ,(4391, 4, 2017, 0)
    ,(4391, 5, 2017, 3)

    Have 12 entries per ID per year and use pivot/unpivot to display the data the same way we currently do.  The downfalls I see here are that the IDs are no longer unique and it results in many more rows.

    Option 2 :create table ID_Dates3
    (
    ID int null
    ,[Year] int null
    ,Jan int null
    ,Feb int null
    ,Mar int null
    ,Apr int null
    ,May int null
    )

    insert into ID_Dates3
    values (2324, 2017, 1, 0, 4, 0, 1)
    ,(3593, 2017, 56, 3, 7, 21, 8)
    ,(5497, 2017, 0, 0, 0, 34, 0)
    ,(1568, 2017, 34, 5, 6, 22, 1)
    ,(9841, 2017, 6, 11, 4, 0, 34)
    ,(8737, 2017, 0, 99, 8, 3, 0)
    ,(4391, 2017, 2, 5, 18, 0, 3)

    Break out the months and only have one entry per year per ID.  Again the downfall I see is that over time the IDs won't be unique if an ID spans a couple of years.

    Can anyone provide their input on which method is best or if there is a different method than what I'm thinking?  Thanks in advance.

  • Use 3NF (3rd Normal Form). You should never be adding extra columns for repeating items, you should be adding extra rows. If you require a unique ID, then add one too. There's nothing wrong with having a composite key.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • I would change things a little. Your datatypes can be improved to take less space. Make the first three columns non-nullable so that you can create a PK on them. Don't forget to specify your desired schema.

    CREATE TABLE dbo.Aggregates
    (
      Id INT NOT NULL,
      Year SMALLINT NOT NULL,
      Month TINYINT NOT NULL,
      Counts INT NULL  --Maybe smallint would work here too?
    );

    ALTER TABLE dbo.Aggregates
    ADD CONSTRAINT PK_Aggregates
      PRIMARY KEY CLUSTERED
      (
       Id,
       Year,
       Month
      );

    INSERT dbo.Aggregates
    (
      Id,
      Month,
      Year,
      Counts
    )
    VALUES
    (2324, 1, 2017, 1),
    (2324, 2, 2017, 0),
    (2324, 3, 2017, 4),
    (2324, 4, 2017, 0),
    (2324, 5, 2017, 1),
    (3593, 1, 2017, 56),
    (3593, 2, 2017, 3),
    (3593, 3, 2017, 7),
    (3593, 4, 2017, 21),
    (3593, 5, 2017, 8),
    (5497, 1, 2017, 0),
    (5497, 2, 2017, 0),
    (5497, 3, 2017, 0),
    (5497, 4, 2017, 34),
    (5497, 5, 2017, 0),
    (1568, 1, 2017, 34),
    (1568, 2, 2017, 5),
    (1568, 3, 2017, 6),
    (1568, 4, 2017, 22),
    (1568, 5, 2017, 1),
    (9841, 1, 2017, 6),
    (9841, 2, 2017, 11),
    (9841, 3, 2017, 4),
    (9841, 4, 2017, 0),
    (9841, 5, 2017, 34),
    (8737, 1, 2017, 0),
    (8737, 2, 2017, 99),
    (8737, 3, 2017, 8),
    (8737, 4, 2017, 3),
    (8737, 5, 2017, 0),
    (4391, 1, 2017, 2),
    (4391, 2, 2017, 5),
    (4391, 3, 2017, 18),
    (4391, 4, 2017, 0),
    (4391, 5, 2017, 3);

    SELECT *
    FROM dbo.Aggregates a;

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Thanks for the feedback.  It's been great help!

  • +1 for the suggestions so far.

    One thing we might need to consider is whether this is an OLTP (transactional) or OLAP (reporting) database.  For transactional systems (EPOS, ticket booking, Hospital administration etc) we definitely want 3NF.  If we are in a reporting environment then we are likely to be looking for speed of aggregation of large datasets in which case some level of de-normalisation may be justified.

    If this is an OLAP solution you may want to consider having a DATE dimension table and rather than use Year/Month in this table, use the ID from the DIM_DATE with a FK constraint.

    It is clear from the original question and suggestions that you are not overly familiar with 3NF but that you have a good gut-feel that there was some code-smell in the proposed designs so kudos there.  Get comfortable with 3NF (or even higher forms); It will be a career-saver for you and may help you prevent others from building poor design in the future

  • aaron.reese - Wednesday, November 8, 2017 3:23 AM

    +1 for the suggestions so far.

    One thing we might need to consider is whether this is an OLTP (transactional) or OLAP (reporting) database.  For transactional systems (EPOS, ticket booking, Hospital administration etc) we definitely want 3NF.  If we are in a reporting environment then we are likely to be looking for speed of aggregation of large datasets in which case some level of de-normalisation may be justified.

    The use of a new column for every month will severely limit the types of reporting you can easily do and, if you write the code correctly, there should be little to no speed disadvantage in reporting from a properly normalized table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You don't need an ID in this table at all (at least on what's known so far); in fact, as so often is the case, it corrupts what would be the proper table design.  Rid your mind of the egregious notion that "every table should have an identity column".

    You should be able to cluster on ( year, month ).  That looks unique.  If it's not, either add a category to make it unique or just don't specify "UNIQUE" on the clus index creation.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Thursday, November 9, 2017 10:30 AM

    You don't need an ID in this table at all (at least on what's known so far); in fact, as so often is the case, it corrupts what would be the proper table design.  Rid your mind of the egregious notion that "every table should have an identity column".

    You should be able to cluster on ( year, month ).  That looks unique.  If it's not, either add a category to make it unique or just don't specify "UNIQUE" on the clus index creation.

    This is not an ID as you and I might know it. If you look back at the initial post, you'll see this:

    ... where the history of aggregated data for an ID is stored all in one table

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin - Thursday, November 9, 2017 11:08 AM

    ScottPletcher - Thursday, November 9, 2017 10:30 AM

    You don't need an ID in this table at all (at least on what's known so far); in fact, as so often is the case, it corrupts what would be the proper table design.  Rid your mind of the egregious notion that "every table should have an identity column".

    You should be able to cluster on ( year, month ).  That looks unique.  If it's not, either add a category to make it unique or just don't specify "UNIQUE" on the clus index creation.

    This is not an ID as you and I might know it. If you look back at the initial post, you'll see this:

    ... where the history of aggregated data for an ID is stored all in one table

    Interesting.  I saw this part in the initial q:
    "where each ID is unique and there is a column for each year/month combo containing the sum or count of the aggregate data."
    and therefore figured it was an identity or equivalent.
    But, I admit, I'm not really sure what that ID is.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply