Seek Table Design Advice\Best Practice: Accounting Period

  • I work primarily with financial applications that deal with transactional data with an accounting period as part of the key (i.e. January 2006) with no regard to the day portion of the accounting period.   Over the years I have used several schemes for the format of the accounting period:

    1) Use datetime datatype:

        AccountPerioddatetime

        - Disadvantage with datetime in my mind is using unneeded space and needed to always force a '1' for day portion for housekeeping purposes (e.g. '1/1/2006').

    2) Separate year and month columns: AccountYear unljmne

      


    maddog

  • You could use an INT (4 bytes) and record February as 200602

    You could use SMALLINT (2 bytes) and record February as 602. Of course your would face the a problem in 327 years.

  • Didn't finish my post before it magically fired off but you get the drift...anyway thanks for the reply.

    I thought of the integer (February 2006 = 200602) approach but I was concerned that a lot of the practical queries will be seeking the year or month portions separately (e.g. SELECT * WHERE SUBSTRING(AccountPeriod, 1, 4) = 2006), but this may not be a big deal since the account period will always be part of an index and I would hope that the parsing script would take advantage of the value in the index and not the table data directly.   Another concern was that I would need to always left pad the month portion so that the scale of the numerics was always the same for comparing two periods.  Any thoughts?

    Best Regards,

    Dennis M.


    maddog

  • I used to build a separate table for accounting periods. We always had weird ones that would sometimes end a day early or late. So I build a table that has period, start date, end date. Join any queries to this table to get the period.

    For us the period was 200601, 200602, etc. up to 200613 because we sometimes had 13 periods.

    You could do a smallint for the year and a tiny int for the period instead of one field as well.

  • Thanks all for your replies.

    I ran some statistics, albeit with limited data, on my own using execution plans to see what might the difference be between defining an accounting period as an int datatype versus a char(6).  The following is some DDL for the two versions and resulting stats:  

    Version 1 - AccountPeriod using int datatype

    ============================================

    CREATE TABLE dbo.ForecastTest1

    (

      ForecastID   int IDENTITY NOT NULL,

      VersionID   int NOT NULL DEFAULT 0,

      AcctPeriod   int NULL,

      Amount    decimal(14,2) NULL

    )

    GO

    ALTER TABLE dbo.ForecastTest1

     ADD CONSTRAINT PK_ForecastTest1 PRIMARY KEY (ForecastID)  

    CREATE INDEX XK_Version_AcctPeriod

     ON dbo.ForecastTest1 (VersionID, AcctPeriod)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200601, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200602, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200603, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200604, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200605, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200606, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200607, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200608, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200609, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200610, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200611, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200612, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200701, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200702, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200703, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200704, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200705, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200706, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200707, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200708, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200709, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200710, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200711, 10)

    INSERT INTO dbo.ForecastTest1 VALUES (1, 200712, 10)

    QUERY 1:

    SELECT * FROM dbo.ForecastTest1 WHERE LEFT(AcctPeriod, 4) = 2006

    EXECUTION PLAN RESULTS 1:

    - FILTER: Convert(Substring(Convert([ForecastTest1].[CFPeriod]), 1, 4)) = 2006  (Cost=0% I/O=0 CPU=0.000004)

    - CLUSTERED INDEX SCAN: PK_ForecastTest1 (Cost=100% I/O=0.0375 CPU=0.000105)

    QUERY 2:

    SELECT * FROM dbo.ForecastTest1 WHERE AcctPeriod BETWEEN 200606 AND 200706

    EXECUTION PLAN RESULTS 2:

    - CLUSTERED INDEX SCAN: PK_ForecastTest1 (Cost=100% I/O=0.0375 CPU=0.000105)

     

    Version 2 - AccountPeriod using char(6) datatype

    ============================================

    CREATE TABLE dbo.ForecastTest2

    (

      ForecastID   int IDENTITY NOT NULL,

      VersionID   int NOT NULL DEFAULT 0,

      AcctPeriod   char(6) NULL,

      Amount    decimal(14,2) NULL

    )

    GO

    ALTER TABLE dbo.ForecastTest2

     ADD CONSTRAINT PK_ForecastTest2 PRIMARY KEY (ForecastID)  

    CREATE INDEX XK_Version_AcctPeriod

     ON dbo.ForecastTest2 (VersionID, AcctPeriod)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200601', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200602', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200603', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200604', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200605', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200606', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200607', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200608', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200609', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200610', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200611', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200612', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200701', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200702', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200703', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200704', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200705', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200706', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200707', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200708', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200709', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200710', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200711', 10)

    INSERT INTO dbo.ForecastTest2 VALUES (1, '200712', 10)

    QUERY 1:

    SELECT * FROM dbo.ForecastTest2 WHERE SUBSTRING(AcctPeriod, 1, 4) = '2006'

    EXECUTION PLAN RESULTS 1:

    - CLUSTERED INDEX SCAN: PK_ForecastTest2 (Cost=100% I/O=0.0375 CPU=0.000105)

    QUERY 2:

    SELECT * FROM dbo.ForecastTest2 WHERE AcctPeriod BETWEEN '200606' AND '200706'

    EXECUTION PLAN RESULTS 2:

    - CLUSTERED INDEX SCAN: PK_ForecastTest2 (Cost=100% I/O=0.0375 CPU=0.000105)

     

    SUMMARY:

    The primary difference it seems between defining the period as int versus char datatype is that the int version must be filtered first to convert the int datatype back to a char to do a substring operation for the first query as shown in execution plan results #1.  Defining the period as a char(6) allowed the direct usage of a substring operation which eliminated the filter step in the execution plan.     The second query to locate periods using BETWEEN did not have any differences in the execution plan that I could see.

    From what I can see I will most likely use the char(6) representation for the accounting period, but I am defintely open to the experiences and advice of others for potential pitfalls that I am not seeing yet.

    Best Regards,

    Dennis M.

     

      


    maddog

  • The INT method does not require any padding.

    The user wants to query for a range of dates then

    CREATE PROC dbo.MyQuery

    @StartYear INT ,

    @StartMonth INT ,

    @EndYear INT ,

    @EndMonth INT

    AS

    DECLARE @StartPeriod INT ,@EndPeriod INT

    SET @StartPeriod = @StartYear * 100 + @StartMonth

    SET @EndPeriod = @EndYear * 100 + @EndMonth

    SELECT *

    FROM dbo.YourTable

    WHERE ReportPeriod BETWEEN @StartPeriod AND @EndPeriod

    If you just want to query for a year then what you are really asking is ReportPeriod BETWEEN 200601 AND 200612

    If your index is clustered then this range query will be very quick.

  • I like the numeric handling you proposed versus the char padding - didn't think of that trick.  I can see that the BETWEEN operator will be used heavily as it would work for nearly all types of queries including the year alone as you mentioned.  I'm shifting my thinking towards using an int versus a char(6) thanks to your input David.

    Best Regards,

    Dennis M.

     


    maddog

  • I dunno... 6 bytes, 4 bytes, 8 bytes... for me, I'd still be tempted to use DateTime just because of all the things you can do with DateTime functions without any conversions required... and a modified version of Steve's idea for a "period" table would certainly prove useful.  Just in case the original requester didn't know it, it's very easy to group by month and year using DateDiff(mm,0,somedate).

    Then, there was that time when the SEC did an audit on my old company and they had to explain why their records weren't timestamped my year, month, day, hour, minute, and second.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I Haven't seen the DateDiff usage for grouping before, have typically used an approach like 'SELECT SUM(Amount) FROM Forecast WHERE DatePart(yy, acctperiod) BETWEEN @StartYear AND @EndYear' with the acctperiod column being a datetime.  A similar usage of DatePart within a GROUP BY expression to get grouped results by month or year.   It's late and I'm not thinking clearly, do you have an example of usage of the DateDiff for both filtering and grouping?  Thanks Again.

    Dennis M.

     


    maddog

  • Sure, Dennis... having a bit of a lack of sleep going on myself, here's an example using the Northwind database...

        USE NorthWind

    DECLARE @Year CHAR(4)

        SET @Year = '1997'

     SELECT DATENAME(mm,DATEADD(mm,d.MonthNum,0)) AS SpelledMonth,

            DATENAME(yy,DATEADD(mm,d.MonthNum,0)) AS [Year],

            d.FreightValue

       FROM (

             SELECT DATEDIFF(mm,0,ShippedDate) AS MonthNum,

                    SUM(Freight) AS FreightValue

               FROM dbo.Orders

              GROUP BY DATEDIFF(mm,0,ShippedDate)

            )d

      WHERE d.MonthNum >= DATEDIFF(mm,0,'01/01/'+@Year)

        AND d.MonthNum <  DATEDIFF(mm,0,'01/01/'+@Year)+12

      ORDER BY d.MonthNum

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If you do WHERE {Function(field)} = {value} then you will probably do a TABLE SCAN because the value isn't an indexed value.

    Probably won't matter in a small-ish database but in a larger one this will punish your system.

  • Yep, true enough... I've made the same recommendation to others, myself.  It's a great example of why you need a primary key, though...

    -------------------------------------------------------------------------------------------------------------------------------

      |--Compute Scalar(DEFINE: ([Expr1004]=datename(month, dateadd(month, [Expr1002], 'Jan  1 1900 12:00AM')), [Expr1005]=datename(year, dateadd(month, [Expr1002], 'Jan  1 1900 12:00AM'))))

           |--Compute Scalar(DEFINE: ([Expr1003]=If ([Expr1012]=0) then NULL else [Expr1013]))

                |--Stream Aggregate(GROUP BY: ([Expr1002]) DEFINE: ([Expr1012]=COUNT_BIG([Orders].[Freight]), [Expr1013]=SUM([Orders].[Freight])))

                     |--Sort(ORDER BY: ([Expr1002] ASC))

                          |--Filter(WHERE: ([Expr1002]>=datediff(month, 'Jan  1 1900 12:00AM', Convert('01/01/'+[@Year])) AND [Expr1002]<datediff(month, 'Jan  1 1900 12:00AM', Convert('01/01/'+[@Year]))+12))

                               |--Compute Scalar(DEFINE: ([Expr1002]=datediff(month, 'Jan  1 1900 12:00AM', [Orders].[ShippedDate])))

                                    |--Clustered Index Scan(OBJECT: ([Northwind].[dbo].[Orders].[PK_Orders]))

    (7 row(s) affected)

    ...It's not the coveted "Clustered Index Seek" but it's no table scan, either.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Gents for your responses.  Ill take some time to see how the DateDiff approach might be used in my situation now that I think I know what's going on with it. 

    Oh and..Go Seahawks!

    (I'm not a football junkie but I can't escape the fact that my wife's parents live a mile from Seahawk stadium in Seattle and drilled the Seahawk mantra into her from a young age.  Just another excuse for me to drink more beer, as if I needed one.)

     


    maddog

  • Jeff,

    Have you seen the trick where you do a completely superfluous WHERE {ClusterIndex} BETWEEN {ClusterIndex}.MinValue AND {ClusterIndex}.MaxValue

    In SQL2000 this can force the clustered index seek even though the clause does nothing in the practical sense. You do gain a performance boost.

    In SQL2005 it still forces a clustered index seek but it adds in a filter step into the execution plan so you don't get any performance benefits. SQL2005 execution plans seem to have half the cost of their 2000 equivalents.

  • Absolutely... and that, in fact, your suggestion would definitely be the best way to solve the problem, but someone wanted me to demo using DATEDIFF as both a filter and a join.  That was the reason for my original post

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply