For some reason a common task of age calculation remains an unsolved problem, causing a lot of trouble for developers. Of course, there are plenty of solutions online which correctly calculate an age euther in years, or in months, or in days. But what about a function which returns the full set of years, months, days, possibly with hours, minutes, and seconds?
Many have attempted this (e.g. http://stackoverflow.com/questions/57599/how-to-calculate-age-in-t-sql-with-years-months-and-days), but there always a case where each of the solution fails to deliver. According to a quite popular (well, Google says so) online Age Calculator, a baby born on 31 March 2016 was never exactly 1 month old. It was "0 months 30 days" old on 30 April, and "1 month 1 day" old on 1 May.
To me - it does not look right. OK, mummy and daddy could find a way around and figure out when to celebrate the baby's first month. But what about legal stuff? Like an invoice issued on 31 March and having a one month payment term? The invoice is never due: not yet due on 30 Apr, and overdue on 1 May. Or the release from a prison term: too early to release on 30 Apr, and entitled for a compensation on the very next day. No, it's definitely not right.
If you build an auto-scheduler on such an age calculator you could get in trouble pretty soon. So, what makes do all those solutions fail? Could it be a common problem in all of them?
I can see the root of the problem in calculating age as a time difference between "Birth Date" and "Today". There are several factors which make this method inappropriate. Let's examine each of them.
A calendar is not a linear thing.
It's not like a distance where you can measure it in the smallest unit you need (say, meters) and then divide by 1000, taking the whole number as the part of the distance in kilometers, and the remainder as the part in meters. This does not work for a calendar. You cannot simply count the number of days between two dates and divide it by number of days in a year, month. Different months have different number of days, years are much more reliable in terms of duration, but still every 4th of them cannot be trusted. A person with the Birth Date = '2016-03-30' is 1 month old on '2016-04-30' . The same as another one, with the Birth Date = '2016-03-31', because there is no 31st day of April:
So, the different pairs of dates, '2016-03-30' - '2016-04-30' and '2016-03-31' - '2016-04-30', have the same age between them - 1 month, but different numbers of days. The same story occurs with leap years. People born on 28-Feb-1980 and 29-Feb-1980 have become 21 years old at the same day - 28-Feb 2001.
Using a date difference days in for calculating a remainder for the Age_in_Month causes that jump in days we've seen on the Age Calculator.
Problems with DATEDIFF
DATEDIFF returns not a whole number of time intervals between two dates, but the number of "unit switches". Let's have a look at this example:
DATEDIFF does not help us to figure out how many whole time units are between the days. We have to count smaller units and try to make make up the number of whole years out of them, which brings back the same issue with non-linearity we have just discussed.
This is not how we count age in real life
Age means how much time has passed since the initial even date (birth date), not how long ago that initial event happened. Even though these 2 definitions seem to be very close, almost the same, they are not. "Ageing" is adding time units to the date of birth. The units which have been added so far make up our date.
Having said that, we get to the correct approach to the Age calculation problem - we need to be adding age units to "birth date" until we reach the "today" date. Of course, we start with the biggest unit in age calculation - a year. No, not a century. A century is 100 years, always 100 years, so there is no need to count centuries. We can always get the number of centuries by whole number division of number of years.
We will be adding years one by one to "birth date" until we reach "today". I implemented it with an "SQL loop" - in-memory Tally table.
While writing this article I used my own TallyGenerator function which can be found here: http://www.sqlservercentral.com/scripts/tally/153313/
We must use zero-based tally table, as we need the script to return "0" for two days within one year. I limited the number of years by 1000, as I don't expect people to live longer than that. Even if Duncan MacLeod shows up one day, I doubt he'll be able to provide a valid birth certificate from an approved authority. The script stopped at Year=4, that's how many whole years fit between the dates.
Next - months
We're gonna need to find the number of whole months within the last year only, so let's find when the last year begins first:
Years Months EndOfLastWholeYear DateTo -------------------- -------------------- -------------------------- ----------------------- 4 11 2008-06-30 00:00:00.000 2008-07-30 00:00:00.000
OK, we've got 4 whole years, 11 whole month and the new date to start with days calculation - EndOfLastWholeMonth. Days to be calculated exactly the same way, so I won't spend to many words and move straight to the script:
Here we go:
Years Months Days -------------------- -------------------- -------------------- 4 11 30
The age calculation is compete with total success. The only thing could be tweaked is the fixed number of records in "years" loop. 1000 may be a waste of resources for "normal" age calculations, and if you turn this script into an inline table function and use for massive age calculations it may impose some performance issues. On another hand, you never know what you can expect from members of McLeod clan - who knows what kind of birth certificates they can present?
Therefore I created a version using a flexible number of iterations in the loop:
Because, as we know, DATEDIFF returns the number year switches, not the number of whole years I added 1 to the returned number. So the final script looks like this:
The script works correctly with parameters of the DATE data type.
If you still have not got enough detail, by adding extra layers you may add hours, minutes, seconds and even milliseconds to the age calculation. But, it might not be required anyway, because time units are linear, so they can be computed "normally". And, of course, hours-minute-sec calculations will work for dates of DATETIME data types but not DATE data type.
The final script is attached, so you can download it ant try it out on your test cases. Feedback would be much appreciated.
In the second part of the article I intend to turn the script into a function calculating the age in time units defined in its parameter. If you're interested - stay tuned!