Fun with business days, calendar tables, and test-driven development

  • Any item of information that can be derived from a date could be stored in the calendar table if it simplifies or improves the performance of queries or code.

    Whether or not the data items you mention should be added to the calendar table is an implementation decision, but they all seem plausible candidates.

    It is quite normal for a calendar table to have more than 20 data items, e.g. http://arcanecode.com/2009/11/18/populating-a-kimball-date-dimension/

  • wildh: thanks for looking those links up. One of those was very helpful to me! Why re-invent the wheel.

  • A few suggestions:

    1) Use the BIT datatype for Is* columns, which eliminates the need for a CONSTRAINT

    2) Make the primary key of the calendar an INT in YYYYMMDD format (per Ralph Kimball), which makes joining faster. If you need a DATE representation, just add a column for it

    3) If you need the closest business date on or before the current date, just add a column for it and populate it once when you create the table. Along the same lines, you may add columns for DayNumberOfYear, WeekNumberOfYear, MonthNumberOfYear, NumberOfHolidaysThisMonth, NumberOfHolidaysThisYear, etc.

    Larry.Shiller@gmail.com

  • CELKO (9/23/2010)


    I still prefer Julianizing the business days to make the math easier

    CREATE TABLE Calendar

    (cal_date DATE NOT NUL PRIMARY KEY,

    julian_business_nbr INTEGER NOT NULL,

    ...);

    INSERT INTO Calendar

    VALUES ('2007-04-05', 42),

    ('2007-04-06', 43), -- good friday

    ('2007-04-07', 43),

    ('2007-04-08', 43), -- Easter sunday

    ('2007-04-09', 44),

    ('2007-04-10', 45); --Tuesday

    To compute the business days from Thursday of this week to next

    Tuesday:

    SELECT (C2. julian_business_nbr - C1. julian_business_nbr)

    FROM Calendar AS C1, Calendar AS C2

    WHERE C1.cal_date = '2007-04-05',

    AND C2cal_date = '2007-04-10';

    Yes, we do it too. When a new holiday comes up, like President Ford's funeral, we update all those future numbers.

  • I really want to encourage people to automate these type of tests. I know that you said in the article about automating tests that “it is not a commonly accepted practice yet” and I think that may be because of the overhead involved by some of the automation frameworks. Here is a solution that is automated yet it is as simple to write as your manual tests. Once you've installed the TST database (http://tst.codeplex.com, once downloaded it takes about 20 seconds to install and it is all T-SQL) all you’d have to do for your case is to write the tests procedures below. Then you can execute the tests in command prompt by running:

    "TST /RunAll DatabaseName"

    or in SQL by running

    "EXEC TST.Runner.RunAll DatabaseName"

    CREATE PROCEDURE dbo.SQLTest_NearestBusinessDayOnOrBefore

    AS BEGIN

    DECLARE @ActualResult DATETIME

    SELECT @ActualResult = Date FROM dbo.NearestBusinessDayOnOrBefore('20100827')

    EXEC TST.Assert.Equals 'NearestBusinessDayOnOrBefore(''20100827'')', {d '2010-08-27'}, @ActualResult

    SELECT @ActualResult = Date FROM dbo.NearestBusinessDayOnOrBefore('20100828')

    EXEC TST.Assert.Equals 'NearestBusinessDayOnOrBefore(''20100828'')', {d '2010-08-30'}, @ActualResult

    SELECT @ActualResult = Date FROM dbo.NearestBusinessDayOnOrBefore('20100829')

    EXEC TST.Assert.Equals 'NearestBusinessDayOnOrBefore(''20100828'')', {d '2010-08-30'}, @ActualResult

    SELECT @ActualResult = Date FROM dbo.NearestBusinessDayOnOrBefore('20100830')

    EXEC TST.Assert.Equals 'NearestBusinessDayOnOrBefore(''20100828'')', {d '2010-08-30'}, @ActualResult

    END

    GO

    CREATE PROCEDURE dbo.SQLTest_NearestBusinessDayOnOrBeforeNull

    AS BEGIN

    DECLARE @ActualResult DATETIME

    SELECT @ActualResult = Date FROM dbo.NearestBusinessDayOnOrBefore(NULL)

    EXEC TST.Assert.IsNull 'NearestBusinessDayOnOrBefore(NULL) should return NULL', @ActualResult

    END

    GO

    CREATE PROCEDURE dbo.SQLTest_BusinessDaysBetweenEndsOfDaysOutOfRange

    AS

    BEGIN

    DECLARE @ActualResult DATETIME

    SELECT @ActualResult = NumDays FROM dbo.BusinessDaysBetweenEndsOfDays('20010827', '20100817')

    EXEC TST.Assert.IsNull 'BusinessDaysBetweenEndsOfDays(''20010827'', ''20100817'') should return NULL', @ActualResult

    SELECT @ActualResult = NumDays FROM dbo.BusinessDaysBetweenEndsOfDays('20100827', '20990817')

    EXEC TST.Assert.IsNull 'BusinessDaysBetweenEndsOfDays(''20100827'', ''20990817'') should return NULL', @ActualResult

    SELECT @ActualResult = NumDays FROM dbo.BusinessDaysBetweenEndsOfDays('20010827', '20990817')

    EXEC TST.Assert.IsNull 'BusinessDaysBetweenEndsOfDays(''20010827'', ''20990817'') should return NULL', @ActualResult

    END

    GO

  • Ladi,

    This is a interesting suggestion, a response I have been waiting for. I know about your framework, I read your article and commented on it. What you are suggesting is certainly worth doing. What we are doing in C# is also a working approach.

    Yet I have another suggestion. Once we are done with manual runs, once we are completely satisfied with the results, we can just use a simple tool to save expected results in Json or XML. Because we frequently have multiple result sets, we better do it in C# - T-SQL is incapable of capturing multiple result sets. Generating expected results is much faster than having to write unit tests.

    Also when expected results are decoupled from the test, it simplifies maintenance.

    What do you think?

  • Thank you Alvin, your remark together with the link from wildt about

    Populating a Kimball Date Dimension

    are what I am looking for.

  • Thanks for a simple yet effective tool.

    One thing was not clear = why all those functions are declared to return TABLE? it looks to me these funcs return just a date. What am I missing?

  • CELKO (9/27/2010)


    Alexander Kuznetsov (9/24/2010)


    CELKO (9/23/2010)


    I still prefer Julianizing the business days to make the math easier

    CREATE TABLE Calendar

    (cal_date DATE NOT NUL PRIMARY KEY,

    julian_business_nbr INTEGER NOT NULL,

    ...);

    INSERT INTO Calendar

    VALUES ('2007-04-05', 42),

    ('2007-04-06', 43), -- good friday

    ('2007-04-07', 43),

    ('2007-04-08', 43), -- Easter sunday

    ('2007-04-09', 44),

    ('2007-04-10', 45); --Tuesday

    To compute the business days from Thursday of this week to next

    Tuesday:

    SELECT (C2. julian_business_nbr - C1. julian_business_nbr)

    FROM Calendar AS C1, Calendar AS C2

    WHERE C1.cal_date = '2007-04-05',

    AND C2cal_date = '2007-04-10';

    Yes, we do it too. When a new holiday comes up, like President Ford's funeral, we update all those future numbers.

    The updating is not much of a problem:

    1) New holidays do not occur very often, per one or two per decade at most. But queries about business days occur very frequently.

    Joe,

    In general, I concur - storing consecutive numbers with business days makes perfect sense. Yet changes in what is a business day occur all the time, especially if we deal with multiple countries, and a business day must be such in all involved countries. Even if we limit the scope to US only, changes do occur. Just recently, Chicago Mercantile Exchange announced how they are going to operate on Columbus Day, and that seems to e a change from previous years.

  • migurus (9/29/2010)


    Thanks for a simple yet effective tool.

    One thing was not clear = why all those functions are declared to return TABLE? it looks to me these funcs return just a date. What am I missing?

    migurus,

    I made sure that my functions are inline, so that they perform fast. TABLE keyword if just part of declaration syntax for inline functions.

  • Alexander Kuznetsov (9/25/2010)


    Ladi,

    This is a interesting suggestion, a response I have been waiting for. I know about your framework, I read your article and commented on it. What you are suggesting is certainly worth doing. What we are doing in C# is also a working approach.

    Yet I have another suggestion. Once we are done with manual runs, once we are completely satisfied with the results, we can just use a simple tool to save expected results in Json or XML. Because we frequently have multiple result sets, we better do it in C# - T-SQL is incapable of capturing multiple result sets. Generating expected results is much faster than having to write unit tests.

    Also when expected results are decoupled from the test, it simplifies maintenance.

    What do you think?

    What you are suggesting is a practical approach. However it goes a little bit against an important principle in unit testing – that you should calculate the expected result yourself and not trust the application to provide a baseline for you. Also in case of TDD that is not really appropriate (or possible).

    I could see some scenarios that involve integration testing rather than unit testing where your approach would be OK. For example when you’d want to collect large results and consider them as a baseline to protect against regressions or changes in behavior before a refactoring.

    Bringing “better languages” in the test automation has some advantages. SQL is error prone compared with C# so if you have a solid infrastructure written in C# that could allow unit testing of SQL code that may be a better solution than T.S.T.

    T.S.T. is addressed to developers or teams who are familiar with T-SQL but do not have the expertise or the environment needed to support other unit test tool that could exercise SQL Code. For these situations T.S.T. is ideal. Your article were written as if for such an environment (the test code was also in SQL) so that is why I thought T.S.T. was an approach worth mentioning.

  • Ladi,

    I think that when we are dealing with T-SQL and the expected results are result sets consisting of rows and columns, with column names and column types, following the principle "that you should calculate the expected result yourself" uses up a huge amount of time. It is much faster to let the tool generate the results for you, if you have a tool you can trust. I do have such a tool, which saves me a huge amount of time. What do you think?

  • It depends. I find that this is a lot easier to discuss on specific examples and I’d like to analyze a couple.

    Let’s take the code that you present in your article. There you have a couple of functions “NearestBusinessDayOnOrBefore” and “BusinessDaysBetweenEndsOfDays”. I had plenty of cases like that in production code where we had to validate UDFs. In that case you should calculate the results yourself and T.S.T. could be an ideal tool to use.

    If you are talking about a test scenario where you expect a stored procedure to return hundreds or thousands of rows, then you are most likely talk about an integration test and not a unit test. Let’s say that you have a stored procedure that will return a dataset where every row contains information about the cost accrued per day in the context of some financial dealings. Maybe the issue is complicated by the fact that there are a lot of parameters like: discounts that fluctuate with time, service interruptions, variable level of service provided, holidays, client status, legal aspects and so on. If you will feed to the system scenarios that cover a long period of time and include variations of a lot of these parameters you may get a result that will consist of hundreds and thousands of rows where the results are impossible to calculate manually. In this case TST can still be useful especially if you store the baselines in some tables or if the baseline can be calculated by an alternate set of procedures. However, depending of the test infrastructure and the tools you have it is possible you will want to use something different. Like I said, this type of test is not a unit test – it is an integration test. Traditionally there is a lot of confusion between the two and in my experience a lot of people build integration tests and call them unit tests. Ignoring for a moment the terminology issue, such a way of doing things is a valid one and automating it has value but also has certain implication (most of them around maintenance) that have to be understood. I’ll get back to this in a minute. A unit test will be different. You’ll feed to the system a set of scenarios where only one parameter varies and where the result consists of a very small data set typically 1-10 rows. It is a little bit more to be said about how the code and the unit tests can be written in a way that will assure code coverage in scenarios where multiple parameter vary simultaneously. These types of tests will be appropriate for TST and the expected results will be calculated manually.

    Getting back to the case where you do an integration test- that can have great advantages but also has certain downsides.

    Advantages:

    -You can cover a large number of cases that include complicated combinations of the scenarios parameters. Creating and freezing a baseline can be done very productively in this way.

    Downsides:

    -This approach suffers from a test matrix explosion. I have seen cases where this was a significant issue. The result was that not all test cases were covered. Also when calculating the expected results there is a real risk to accept erogenous results. Calculating expected results based on some alternate automation is not trivial. Unit tests will not suffer from the matrix explosion issue.

    -When a calculation rule changes it can be sometimes difficult to recalculate the expected result. Unit tests are great in this case if that were written correctly in the sense that they have a minimum of overlapping.

    I think that both integration tests and unit tests have value but it is important to understand the difference ant their cost implication especially long term related to maintenance in face of changing requirements.

    In conclusion I think that T.S.T can give you a very fast start – asks only for SQL expertise and has no other dependencies. It is also very easy to install and very easy to learn for someone who knows SQL. For large systems it may make sense to be combined or even replaced with heavier and more feature rich frameworks.

  • CELKO (9/23/2010)


    I still prefer Julianizing the business days to make the math easier

    I know it's an old post but absolutely spot on here, Joe. It makes life much easier and the code much faster especially when proper indexing is in place.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply