The place of SMALLDATETIME

Question

The place of SMALLDATETIME

sean redmond

SSCertifiable

Points: 5836
More actions
April 11, 2023 at 9:20 am

#4172140

SMALLDATETIME is a datatype that gets very little love online and few people recommend it. Kendra Little is right to warn about the rounding up/down of seconds. Microsoft explicitly recommends DATETIME2 over SMALLDATETIME on the Microsoft Learn page.
However, it is only 4 bytes in length and gives accuracy to a minute and DATETIME2(0) is 6 bytes in length.
I have a metadata column that records when an entry was made. Accuracy to a minute is acceptable. We are expecting 11 million entries per year.
The 2-byte difference adds up to an extra 2700 pages per year or 21MB per year. This is for one column in one table. Seconds would be nice but I prefer a more efficient system.
Is there any reason why I should specify DATETIME2(0) rather than SMALLDATETIME? Timezones aren't an issue for us nor is daylight savings time. We won't need any years other than this year and the following 10 or so years.

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99143 More actions · Answer 1

One of my biggest pet peeves about (small)datetime is its handling of the format yyyy-MM-dd( hh:mm:ss) as it is ambiguous. For those using American as their language they are "fine", as a value like 2023-04-11 would result in the date 11 April 2023. For the rest of the world, however, this isn't case; it would result in the date 04 November 2023. When SSMS generally generates data in the results pane in the format yyyy-MM-dd if you are working with (small)datetime years of having to change the value to yyyyMMdd hh:mm:ss or yyyy-MM-ddThh:mm:sswhen you copy the value out starts to get old fast. 😉

The new date and time data types much better respect the ISO formats; they all treat a value in the format yyyy-MM-dd as a value in that format regardless of language or DMY settings.

I, personally, just find datetime2/date much more consistent in their behaviour.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

ScottPletcher SSC Guru Points: 101223 More actions · Answer 2

If you're that worried about space, you could just a minutes displacement from Jan 01, 2023. So, midnight on that date would be 0. 2PM would be 14*60, etc.. Not quite as convenient as storing an actual date and time but workable, and much shorter in size (esp. if you compress the data).

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

sean redmond SSCertifiable Points: 5836 More actions · Answer 3

Hi Scott,

I'm not that worried about space per se. The column will be one of metadata and while seconds would be nice to have, accuracy to hours & minutes is grand.

The question is one more of database design & efficiency. This looks like low-hanging fruit. Our developers have been using DATETIME2(7) as standard up to now and I have an opportunity to change it. DATETIME2 brings us little extra benefit and I have the opportunity to make a table narrower than beforehand. I'm looking for reasons why I wouldn't use SMALLDATETIME. The lack of seconds and the last 30 seconds before midnight are the best I have so far.

Your SMALLINT suggestion would mean more work for the developers even if it would save 2 more bytes. It would also mean more work for us when it came to tracking down problems when we would have to manually convert no. of minutes since 2023 into time recorded.

Thanks for your comment,

Sean.

CreateIndexNonclustered SSChasing Mays Points: 658 More actions · Answer 4

Thom A wrote:

One of my biggest pet peeves about (small)datetime is its handling of the format yyyy-MM-dd( hh:mm:ss) as it is ambiguous. For those using American as their language they are "fine", as a value like 2023-04-11 would result in the date 11 April 2023. For the rest of the world, however, this isn't case; it would result in the date 04 November 2023. When SSMS generally generates data in the results pane in the format yyyy-MM-dd if you are working with (small)datetime years of having to change the value to yyyyMMdd hh:mm:ss or yyyy-MM-ddThh:mm:sswhen you copy the value out starts to get old fast. 😉
The new date and time data types much better respect the ISO formats; they all treat a value in the format yyyy-MM-dd as a value in that format regardless of language or DMY settings.
I, personally, just find datetime2/date much more consistent in their behaviour.

Are you saying that the collation impacts the formatting of the date? I haven't really used smalldatetime, that seems like a very microsoft accessy problem to have

Thom A SSC Guru Points: 99143 More actions · Answer 5

CreateIndexNonclustered wrote:

Are you saying that the collation impacts the formatting of the date? I haven't really used smalldatetime, that seems like a very microsoft accessy problem to have

The collation, no, that doesn't have any effect on it. The behaviour would be the same if you were using Albanian_CS_AI_KS_WS, Icelandic_BIN or Latin1_General_100_CI_AS_KS_WS_SC_UTF8. The collation effects what characters are available in the code page for a varchar, and the ordering of the characters for a string based data types; for example a binary collation order all upper case letters before lower case (ABC..YZabc...yz), but a (non-binary) latin colation would have all the characters in alphabetical order, starting with Upper case (AaBb...YyZz).

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

CreateIndexNonclustered SSChasing Mays Points: 658 More actions · Answer 6

There are localization differences for the default code page when doing certain type conversions. I was wondered if collation is where that came from when a string is implicitly converted to a time or date.

from the date component of the attribute, year-month-day is unambiguous while day-month-year is ambiguous. Am curious where the issue would arise. I've never really used that data type and like trivia things like this to beat people over the head with it when they try to use a dumb data type like this or money.

To the OP - I think doing that all that to save 2 bytes is an extremely bad idea. By your numbers it saves you 21 Mb per year which hasn't been a relevant amount of storage in 20+ years but opens immense surface area for future problems. MS already advices you not to use it, even if your requirement 10 years of data retention and it actually stays at that, MS could deprecate the data type before the 10 years are up. I wouldn't expect SQL server to remove the type anywhere near that time frame even if they deprecate it tomorrow, but various frameworks and ORMs may move more quickly to remove them after they have been deprecated.

Thom A SSC Guru Points: 99143 More actions · Answer 7

CreateIndexNonclustered wrote:

There are localization differences for the default code page when doing certain type conversions. I was wondered if collation is where that came from when a string is implicitly converted to a time or date.

It's the DMY settings that affect the implicit conversion (for date and time values). This is (often) driven by the language setting of the LOGIN (or USER if it's a contained database), however, it can also be set explicitly for the connection using SET DATEFORMAT.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 8

sean redmond wrote:

I'm not that worried about space per se. The column will be one of metadata and while seconds would be nice to have, accuracy to hours & minutes is grand.
The question is one more of database design & efficiency.

You'll need to write your inserts/updates into those columns so that they don't round up to the next hour, day, week, month, or year by mistake.

That being said, I'll suggest that it's not worth the hassle and it's not "good design" nor is it "efficient" 😉 especially since you're not actually "worried about space per se". It's also not worth pissing off your developers nor is it worth burning extra CPU time to prevent the rounding errors nor is it worth taking the time and effort to find and change every piece of code you write going forward or already have to do the rounding correctly. And adding a "catch all" trigger to do it for every table is another form of not "good design" and it certainly is not "efficient".

I'll also suggest that you have much bigger fish to fry when it comes to "good design" and "efficiency" and that they'll burn you on a stake if some manager ever tells you that a customer insists on millisecond or even second accuracy. In case you're wondering, you can tell that I've seen that happen before.

Save temporal data at the 100 nanosecond layer and save the "correct rounding" for reporting. I put this in the same category as having a separate DATE and TIME column... it's just not worth it in the big picture nor in the long run.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sean redmond SSCertifiable Points: 5836 More actions · Answer 9

Thank you all.

The consensus is that it is just not worth it. I hadn't planned on doing any extra work other than changing the default datatype. It is simply a column that tells us when this row was created. The rounding up of the last 30 seconds to the next hour is not a problem. And it just isn't for one table, every table in the DB has one of these columns (which, in this case, is 83 at the moment). Few of them, though, grow at tens of millions of rows per year. I should not pick the low-hanging fruit for fear it might have worms. I should go to the shop and buy one there.

And as for space, if we have 100GB of data, then we have a 100GB of data. But I'd rather have 90GB data if that simply involves changing the datatype from INT to TINYINT for foreign-key columns whose primary-key tables that have only 30 rows. Or worse, we have table defining a person's sex. It has two rows and is defined as an INT. Thankfully the person table only has a few hundred thousand rows.

I shall mull it over. Ye may well be right that I am concentrating on things that are of little importance.

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 10

sean redmond wrote:

The rounding up of the last 30 seconds to the next hour is not a problem.
...
And as for space, if we have 100GB of data, then we have a 100GB of data.

If the dates/times in the column of those tables isn't consider to be "data" and the resolution of the dates and times can be so shoddy, then why even have the columns?

And, no... not looking for answer from you there. Just throwing another thought at you. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

CreateIndexNonclustered SSChasing Mays Points: 658 More actions · Answer 11

sean redmond wrote:

Thank you all.
And as for space, if we have 100GB of data, then we have a 100GB of data. But I'd rather have 90GB data if that simply involves changing the datatype from INT to TINYINT for foreign-key columns whose primary-key tables that have only 30 rows. Or worse, we have table defining a person's sex. It has two rows and is defined as an INT. Thankfully the person table only has a few hundred thousand rows.

would be cautious of using tinyint. Even if there will absolutely never be more than 256 values a mistake can quickly fill up the whole range. using it for sex (literal sex) is probably ok, but most organizations really mean gender when they ask for sex that could accumulate many more values that biological sex. Most of those don't understand the difference between gender and sex, and if they need to record both, the cartesian map will approach or exceed 256 values very quickly.

ScottPletcher SSC Guru Points: 101223 More actions · Answer 12

CreateIndexNonclustered wrote:

sean redmond wrote:
Thank you all.
And as for space, if we have 100GB of data, then we have a 100GB of data. But I'd rather have 90GB data if that simply involves changing the datatype from INT to TINYINT for foreign-key columns whose primary-key tables that have only 30 rows. Or worse, we have table defining a person's sex. It has two rows and is defined as an INT. Thankfully the person table only has a few hundred thousand rows.
would be cautious of using tinyint. Even if there will absolutely never be more than 256 values a mistake can quickly fill up the whole range. using it for sex (literal sex) is probably ok, but most organizations really mean gender when they ask for sex that could accumulate many more values that biological sex. Most of those don't understand the difference between gender and sex, and if they need to record both, the cartesian map will approach or exceed 256 values very quickly.

I'd say stick with tinyint. There really aren't 250 genders, no matter how much some people try to pretend there are, and certainly not significant enough ones that you'd need to distinguish them.

IF you're row compressing the table -- and for most tables that's essentially a "free" process -- then you could smallint, since it would be compressed to only one byte when the value was small enough anyway.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

CreateIndexNonclustered SSChasing Mays Points: 658 More actions · Answer 13

ScottPletcher wrote:

CreateIndexNonclustered wrote:
sean redmond wrote:
Thank you all.
And as for space, if we have 100GB of data, then we have a 100GB of data. But I'd rather have 90GB data if that simply involves changing the datatype from INT to TINYINT for foreign-key columns whose primary-key tables that have only 30 rows. Or worse, we have table defining a person's sex. It has two rows and is defined as an INT. Thankfully the person table only has a few hundred thousand rows.
would be cautious of using tinyint. Even if there will absolutely never be more than 256 values a mistake can quickly fill up the whole range. using it for sex (literal sex) is probably ok, but most organizations really mean gender when they ask for sex that could accumulate many more values that biological sex. Most of those don't understand the difference between gender and sex, and if they need to record both, the cartesian map will approach or exceed 256 values very quickly.
I'd say stick with tinyint. There really aren't 250 genders, no matter how much some people try to pretend there are, and certainly not significant enough ones that you'd need to distinguish them.
IF you're row compressing the table -- and for most tables that's essentially a "free" process -- then you could smallint, since it would be compressed to only one byte when the value was small enough anyway.

Personal positions (regardless of what side they take) are irrelevant to business and compliance requirements. If compliance or operations say that the 7-8 types of hermaphroditism, plus male, female, unknown, decline to answer need to be recorded and four gender expressions plus unknown and decline to answer in a multiple value property, there are already 72 lookup values without getting into some of the more esoteric gender expressions they may or may not require in the future. These requirements are becoming common in any field that does crisis intervention at minimum as a tertiary activity. Whatever maximum number of values someone personally thinks there should be also doesn't impact periodic application administrator or developer error that inserts bad values that need to be inactivated but left in place.

Concern for storage space is not a valid concern in any sql server of the last 20+ years when dealing with the difference between 8 bit and 16 bit integers, compressed or not. Join performance is, but that isn't positively impacted by compression.

ScottPletcher SSC Guru Points: 101223 More actions · Answer 14

CreateIndexNonclustered wrote:

ScottPletcher wrote:
CreateIndexNonclustered wrote:
sean redmond wrote:
Thank you all.
And as for space, if we have 100GB of data, then we have a 100GB of data. But I'd rather have 90GB data if that simply involves changing the datatype from INT to TINYINT for foreign-key columns whose primary-key tables that have only 30 rows. Or worse, we have table defining a person's sex. It has two rows and is defined as an INT. Thankfully the person table only has a few hundred thousand rows.
would be cautious of using tinyint. Even if there will absolutely never be more than 256 values a mistake can quickly fill up the whole range. using it for sex (literal sex) is probably ok, but most organizations really mean gender when they ask for sex that could accumulate many more values that biological sex. Most of those don't understand the difference between gender and sex, and if they need to record both, the cartesian map will approach or exceed 256 values very quickly.
I'd say stick with tinyint. There really aren't 250 genders, no matter how much some people try to pretend there are, and certainly not significant enough ones that you'd need to distinguish them.
IF you're row compressing the table -- and for most tables that's essentially a "free" process -- then you could smallint, since it would be compressed to only one byte when the value was small enough anyway.
Personal positions (regardless of what side they take) are irrelevant to business and compliance requirements. If compliance or operations say that the 7-8 types of hermaphroditism, plus male, female, unknown, decline to answer need to be recorded and four gender expressions plus unknown and decline to answer in a multiple value property, there are already 72 lookup values without getting into some of the more esoteric gender expressions they may or may not require in the future. These requirements are becoming common in any field that does crisis intervention at minimum as a tertiary activity. Whatever maximum number of values someone personally thinks there should be also doesn't impact periodic application administrator or developer error that inserts bad values that need to be inactivated but left in place.
Concern for storage space is not a valid concern in any sql server of the last 20+ years when dealing with the difference between 8 bit and 16 bit integers, compressed or not. Join performance is, but that isn't positively impacted by compression.

The business I'm at certainly doesn't require more than 200 gender "types." Neither does yours I'm sure, nor anyone else's at this point (I guess some ultra-woke colleges theoretically could, but ... likely nah).

Storage is still a valid concern. Particularly if you have 100Ms/Bs of rows. You have to remember that it also affects log writes/size. And data compression generally does not reduce log size requirements.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".