Questions about SQL Server Data Types You were Too Shy to Ask

Although SQL Data Types seem to cause a lot of grief for database developers and can be tricky in their use, we seem to be expected to know all about them, and so it is embarrassing to ask questions about them in forums. Rob Sheldon continues in his mission to answer all those questions that we hesitate to ask.

The Questions

  1. “Under what circumstances is character data truncated without triggering an error?”.
  2. “SQL Server’s implicit conversions sometimes lead to unexpected query results, such as a value being returned as a data type different from what I expect. How do implicit conversions work?”.
  3. “I’ve created a query that uses the ISNUMERIC function to verify whether a value is numeric and, if so, convert it to an integer. In some cases, however, the database engine generates an error because it’s trying to convert a value that’s not really a numeric. What’s going on?”
  4. “I’m told I should use the BIT data type where possible to save space. Is it true that it saves space or is this just syntactic sugar?”
  5. “When I use the DATALENGTH and LEN functions to calculate the size of a string, I often receive different results, even though I’d expect them to be the same. What’s the difference between the two functions?”.
  6. “I’m troubleshooting a stored procedure that uses a COALESCE expression to return the first non-NULL value from a set of columns. However the procedure often generates a conversion error, even if the expression returns is the first column listed. Any idea why this might be occurring?”.
  7. “I’m developing a query that contains a BETWEEN operator in the WHERE clause. The compared data includes a range of consecutive values that contain both letters and numbers, as in XYZ60 through XYZ700. Even when I specify the entire range, values are unexpectedly omitted from my query results. Any idea why this is occurring?”
  8. “I’ve seen some T-SQL code that contains ‘national character varying(max)’ listed as that data type, rather than one of the normal types, yet everything seems to work fine. Is this something new?”.
  9. “When is it best to use the CHAR date type rather than VARCHAR?”.
  10. “I’m building a query that joins a FLOAT column to a REAL column, but when I run the statement, it returns no results, even though I can see the matching values in the two columns. Why is this happening?”.
  11. “I have a table that includes a VARCHAR column and a stored procedure that accesses data from the table, based on values in that column. The procedure includes an NVARCHAR parameter used to qualify the WHERE clause, which references the VARCHAR column. Does it matter that the data types are different between the parameter and the column?”.
  12. “My query needs to calculate past and future dates based on an existing date as well as calculate the time difference between two dates. I’ve tried using arithmetic operators to perform the calculations, but the results are never right. How do I go about working with dates?”.
  13. “I’m told never to use SQL_VARIANT data type inappropriately. What is an appropriate use for this data type?”

“Under what circumstances is character data truncated without triggering an error?”

Never, one might hope. But SQL Server is a sneaky devil, preventing truncation some of the time, but not all the time. Sure, if you try to insert too large a value into a column, the database engine baulks, just like it would for the following table:

The StuffID column, the primary key, is configured with the INT data type, and the StuffName column is configured as VARCHAR(10). All very basic. And just as basic is the following INSERT statement:

What we’re trying to do is here is insert a value into the StuffName column that exceeds the data type’s specified length. Not surprisingly, SQL Server kicks out an error:

 Everything just as we’d expect. The value being inserted is too big, so the database engine jumps to the rescue, preventing any values from being truncating and, consequently, ensuring integrity of the data.

Unfortunately, SQL Server is not quite so protective with its variables and parameters. Consider the following T-SQL script, which declares the @stuff variable and sets its value to a string that exceeds the data type’s specified length:

The database engine doesn’t blink. It simply inserts the string’s first 10 characters (Really Big) and goes about its business, pretending that all is well, when in fact our variable must limp along with a truncated value. The PRINT statement merely confirms our suspicions and returns the really wrong Really Big.

It gets worse. Imagine you’re using an ISNULL function to compare two variables, as in the following example:

The first variable is defined as VARCHAR(5) and assigned a value of NULL. The second is defined as VARCHAR(10) and assigned the string value Big Thing. So far so good. Now we get to ISNULL. The thing with this function is that the first expression (in this case, @stuff1) determines the outputted data type, even if that expression returns a NULL. That means the @stuff2 value is confined by @stuff1 data type constraints. As a result, our PRINT statement, rather than returning Big Thing, gives us Big T, which sounds more like a mob boss than a credible data item.

Parameters too can fall victim to truncation without warning. Suppose we create the following stored procedure:

A simple example, of course, but it demonstrates an important concept. Notice that the @stuff parameter is defined as VARCHAR(10). Now suppose we pass in a value longer than 10 characters:

The database engine runs the procedure without a hitch and returns a message saying that one row is affected (the row inserted into the OurStuff table). We can verify this by running a simple SELECT statement:

The results, shown in the following table, indicate that the value has indeed been truncated, although not once did SQL Server raise an exception.

StuffID

StuffName

2

Really Big

So be aware of what can happen when working with variables and parameters. You’ll likely want to include in your T-SQL code the logic necessary to catch values that might be faulty. Obviously, you can’t rely on the database engine to do it for you.

“SQL Server’s implicit conversions sometimes lead to unexpected query results, such as a value being returned as a data type different from what I expect. How do implicit conversions work?”

SQL Server is great at surreptitiously converting values from one type to another: CHAR to VARCHAR, FLOAT to NCHAR, BIGINT to MONEY. You get the picture. As long as the values are compatible with the target type, the database engine does all the work, and you get to kick back and reap the rewards.

Sort of.

Imagine the following scenario, in which we add a CHAR variable to an INT variable:

We declare the variables, assign values, and add them together. We also pull the base type of the returned value, just so we know what’s going on. The following table shows us the results of our SELECT statement:

AllStuff

BaseType

55

int

All looks fine on the surface. We add our INT value to our CHAR value and come up with the total (55), returned as an INT value. The database engine has implicitly converted the CHAR value to an INT value. The INT data type is used because INT takes precedence over CHAR. SQL Server is quite explicit when it comes to data type precedence. INT wins out over CHAR and VARCHAR every time. If a DATETIME value is tossed into the mix, it will beat out INT and just about all other types. (You can find details about precedence rules in the MSDN topic “Data Type Precedence.”)

Returning to our example, SQL Server converts the CHAR value to INT and then proceeds to add the two integers together, giving us a result of 55. But what if we didn’t want to add the values together but concatenate them instead? In that case, we’d have to explicitly convert the INT value:

Now our results look quite different. Not only are the two values concatenated, but the value is also returned as a CHAR type:

AllStuff

BaseType

2233

char

You should also be aware of how SQL Server handles implicit conversions for numerical data. For example, suppose this time around, we create a DECIMAL variable and an INT variable, assign a value to the DECIMAL variable, and then set the INT variable to equal the DECIMAL variable:

The SELECT statement now returns these results:

Stuff1

Stuff2

1234.9999

1234

Notice what the database engine has done. Rather than rounding the value up, as might be expected, it simply truncates the value, giving us our integer, without too great a concern for the original value. And the problem isn’t limited to DECIMALINT conversions. Check out what happens when we go from INT to REAL and back again:

We declare the INT variables and set their values, and then declare a REAL variable and set its value to equal the second INT variable. In our SELECT statement, we then retrieve the REAL variable as it is stored, convert it to an INT, and then check whether the first and third variables are equal. The following table shows our results:

Stuff3

Stuff3in

IsEqual

1.08E+09

1080000000

yes

Notice that the Stuff3 value appears as scientific notation because the @stuff3 variable had insufficient precision to hold the original integer. In addition, when the variable is converted to an integer, it returns a value different from what it was assigned. Instead, it now equals the value stored in @stuff1.

In many cases, you shouldn’t leave it up to the database engine to do your conversions, no matter how tempting that might be. In fact, some developers insist that all conversions should be explicit. At the very least, be sure that whenever the integrity of the data could be brought into question, error on the side of converting the data explicitly.

“I’ve created a query that uses the ISNUMERIC function to verify whether a value is numeric and, if so, convert it to an integer. In some cases, however, the database engine generates an error because it’s trying to convert a value that’s not really a numeric. What’s going on?”

The ISNUMERIC function can be a slippery one. On the surface, its purpose appears only to determine whether the submitted value is indeed a valid numeric type. But looks can be deceiving. Take the following T-SQL script. It creates a table with a VARCHAR column and then adds a mix of values to that column:

Nothing extraordinary, just your run-of-the-mill alphanumeric hodgepodge. But now let’s look at what happens when we apply the ISNUMERIC function to the StuffName column:

All we’re doing is pulling the table’s two columns, as well as returning a third column to test each value’s numeric status. The ISNUMERIC function returns a 1 if the tested value appears to be a numeric data type, otherwise returns a 0. Here’s what we get:

StuffID

StuffName

IsNumber

1

abc

0

2

def123

0

3

456

1

4

$789

1

5

1d1

1

6

,

1

7

.

1

8

$.,

1

It should come as no surprise that the first two values are considered non-numeric and the third value is. However, the function also considers $789 to be numeric, despite the dollar sign (or more precisely, because of the dollar sign), and 1d1 to be numeric because of its resemblance to scientific notation, whether or not that is the intent.

The sky might not fall because of these interpretations, but they could cause problems if you want to perform other operations based on the output of the ISNUMERIC function. For example, suppose we want to explicitly convert each value to an INT based on whether we think that value is numeric:

In this case, we’re retrieving only the first three rows from our table. If the ISNUMERIC function returns a 1, we convert the value to the INT type; otherwise, we return a NULL value. As to be expected, the first two rows each return a NULL and the third row returns the value 456 as an INT, as shown in the following results:

StuffID

StuffName

CaseResult

1

abc

NULL

2

def123

NULL

3

456

456

Now let’s retrieve the fourth row from the table:

This time around, we receive the following error message, indicating that we cannot convert that value to an INT:

This shouldn’t come as too big of a surprise. The INT data type has never been fond of dollar signs.  The same goes for scientific notation. If we were to try to run the query against the last row in our table, we would receive a similar error.

And it’s not just dollar signs and possible scientific notation that can elicit problems. The ISNUMERIC function recognizes an assortment of characters as being numeric (in addition to actual numbers), such as plus signs and minus signs and periods and tabs and commas. Take a look at the following example, in which we assign a tab to the @tab variable:

As odd as this might seem, the ISNUMERIC function returns a 1, as it will in the next example, in which we pass in a plus sign:

If you’re working with a version of SQL Server prior to 2012, you’ll have to come up with some sort of workaround to check for those circumstances in which the apparent numeric value would generate an error. You might, for example, create a function or add a CASE statement to setup complex test conditions.

For those of you working with SQL Server 2012 or later, you have a better alternative-the TRY_CONVERT function:

The function first tests whether the requested conversion will work. If so, it converts the data to the target data type. If not, it returns a NULL:

StuffID

StuffName

CaseResult

1

abc

NULL

2

def123

NULL

3

456

456

4

$789

NULL

5

1d1

NULL

5

,

NULL

5

.

NULL

5

$.,

NULL

As you can see, the only number that can be converted is 456. The rest are there only to complicate the lives of overworked DBAs with nothing better to do than to search for wayward numbers.

“I’m told I should use the BIT data type where possible to save space. Is it true that it saves space or is this just syntactic sugar?”

Yes, the BIT data type can save space under certain circumstances, and it undoubtedly has some advantages over an approach such as using CHAR(1), which requires a CHECK constraint to limit the values in the same way BIT already does. With the BIT data type, you can insert only the values 1 and 0, as well as NULL values, assuming the object is nullable.

That said, a BIT data type still uses a byte of data, just like CHAR(1) or TINYINT. Some might have assumed that the BIT type translates into only one bit of data, given the type’s tie-in to the 1 and 0 value limits. But that’s hardly the case. BIT is actually in integer data type and consequently requires more legroom. Case in point:

As you can see in the following results, the value does indeed use one byte:

Stuffed

BitLength

0

1

However, if you create a table with multiple BIT columns, the database engine is clever enough to store the column data in a single byte, up to eight columns. If your table contains between 9 and 16 BIT columns, SQL Server stores the data as two bytes. Between 17 and 24 BIT columns? Three bytes, and so on. You get the picture.

That said, you should still be wary about implementing BIT. It can have unexpected results when used in an expression, and it’s not necessarily well suited to masquerading as a bit flag or bit mask. For an interesting take on the whole issue, check out Joe Celko’s excellent article “BIT of a Problem.”

“When I use the DATALENGTH and LEN functions to calculate the size of a string, I often receive different results, even though I’d expect them to be the same. What’s the difference between the two functions?”

The DATALENGTH function returns the number of bytes used by an expression, and the LEN function returns the number of characters contained in an expression. The LEN function is also limited to string expressions; whereas, the DATALENGTH function supports expressions of any type.

To better understand how they both work, let’s take a look at them in action. The following T-SQL script creates a table with four columns, each configured with a character data type, and then inserts data into those columns:

Notice that, in the final INSERT statement, I’ve added several spaces after each value. You’ll see why in a moment. But first, let’s retrieve data from each of the columns, applying both the LEN and DATALENGTH functions:

The following table shows the statement’s output:

len1

datalength1

len2

datalength2

len3

datalength3

len4

datalength4

5

10

5

20

5

5

5

10

5

10

5

20

5

8

5

16

The len1 and datalength1 columns in the resultset refer to the char1 column in the TestLength table. The column is configured as CHAR(10). The LEN function provides of the actual number of characters being used (5), not including the trailing spaces. The DATALENGTH function indicates that 10 bytes are being used, despite the actual number of characters. This is because the column is configured at a fixed width of 10. As a result, the DATALENGTH function can provide us with a more accurate view of our storage requirements.

The len2 and datalength2 columns in the resultset refer to the nchar2 column and work much the same way, except that it is a Unicode column configured as NCHAR(10). The LEN function still shows that five characters are being used (not including spaces), but the DATALENGTH function indicates that the value actually requires 20 bytes, just like we’d expect.

When we get to the len3 and datalength3 columns, the results get even more interesting. This data is based on the varchar3 column, which is configured as VARCHAR(10). Even so, the LEN function still indicates that only five characters are being used. However, because we’re working with a VARCHAR column, the DATALENGTH function indicates that five bytes are being used by the value in the first row and eight bytes by the value in the second row (to accommodate the trailing spaces). The len4 and datalength4 columns work in the same way, only the number of bytes is doubled in each case to account for the Unicode type.

As mentioned earlier, although the LEN function is limited to string data, you can use the DATALENGTH function on an expression of any data type. For example, the following T-SQL declares an INT variable, assigns a value, and then checks the number of bytes the variable uses:

The SELECT statement returns the results shown in the following table, which indicate that the value uses four bytes, the expected length for the INT data type:

IntNew

IntLength

1234567

4

Even if our integer were made up of only two digits, the results would be the same. Same goes for a DATETIME value, in terms of consistency:

In this case, the DATALENGTH function returns eight bytes:

DateNow

DateLength

2014-07-05 21:32:23.507

8

We would expect eight bytes because DATETIME values are stored as two integers, one for the date and one for the time.

“I’m troubleshooting a stored procedure that uses a COALESCE expression to return the first non-NULL value from a set of columns. However the procedure often generates a conversion error, even if the expression returns is the first column listed. Any idea why this might be occurring?”

A COALESCE expression can be a bit tricky to get right at first. It is essentially shorthand for a type of CASE expression that evaluates a series of expressions, often columns or variables, and returns the first one that does not evaluate to NULL. Let’s look how it works. The following T-SQL declares several variables and assigns values to them, based on their respective types:

The COALESCE expression takes the three variables as an argument. Because the @StuffNull variable contains a NULL value, it will be skipped and we’ll move to the next variable, @StuffID. This one contains a value, so the SELECT statement returns that value in its results:

NotNullStuff

BaseType

1001

int

Everything seems in order here. The @StuffID variable does not contain a NULL, so that value is returned, and it is indeed an integer, just as we would expect. But what if we change the order of our variables:

The @StuffName variable does not contain a NULL value, so that variable’s value should be returned. However, the statement instead generates the following error:

The problem is that a COALESCE expression uses the data type with the highest precedence. It does not matter what order the arguments are presented. Because INT has a higher precedence than VARCHAR, the returned type is an INT, which means the database engine is trying to convert the Big Thing string to an integer. It just doesn’t work.

Data type precedence also applies to a type’s length:

Because @StuffName1 is the first variable to contain a non-null value, that value is returned. However, because @StuffName2 has a greater length (15 as opposed to 10), that is the data type of the returned value, as the following table shows:

NotNullStuff

BaseType

BaseLength

Big Thing

varchar

15

Although both variables are VARCHAR, the 15 length has precedence over the 10 length, just like INT has precedence over a VARCHAR.

“I’m developing a query that contains a BETWEEN operator in the WHERE clause. The compared data includes a range of consecutive values that contain both letters and numbers, as in XYZ60 through XYZ700. Even when I specify the entire range, values are unexpectedly omitted from my query results. Any idea why this is occurring?”

When you use the BETWEEN operator for non-numerical data, you have to keep in mind how SQL Server sorts and searches for data. The example data you provide is considered character data (non-numerical), so the returned range of values follow the rules of the database’s configured collation, which specifies how character data is sorted and compared, based on language and usage norms.

The best way to understand this is to see it in action. The following T-SQL creates the OurStuff table, which includes a VARCHAR column and DATEIME column, and inserts a number of rows into those columns:

Now suppose we issue a simple query that retrieves the rows of data based on StuffID values 6 through 16:

Not surprisingly, the statement returns the results shown in the following table:

StuffID

StuffProdID

StuffDate

6

ab59

2014-07-12 02:46:48.240

7

ab398

2014-07-13 22:46:48.240

8

ab399

2014-07-14 20:46:48.240

9

ab400

2014-07-15 18:46:48.240

10

cd38

2014-07-16 16:46:48.240

11

cd39

2014-07-17 14:46:48.240

12

cd48

2014-07-18 12:46:48.240

13

cd49

2014-07-19 10:46:48.240

14

cd58

2014-07-20 11:46:48.240

15

cd59

2014-07-21 12:46:48.240

16

cd398

2014-07-22 13:46:48.240

he BETWEEN operator does exactly what we expect. It includes rows 6 through 16 and omits the rest. However, now suppose we modify the BETWEEN clause to retrieve data based on the StuffProdID column:

Now our results include only the first nine rows:

StuffID

StuffProdID

StuffDate

1

ab38

2014-07-06 12:46:48.240

2

ab39

2014-07-08 10:46:48.240

3

ab48

2014-07-09 08:46:48.240

4

ab49

2014-07-10 06:46:48.240

5

ab58

2014-07-11 04:46:48.240

6

ab59

2014-07-12 02:46:48.240

7

ab398

2014-07-13 22:46:48.240

8

ab399

2014-07-14 20:46:48.240

9

ab400

2014-07-15 18:46:48.240

Although we’ve specified that the letter c as the end of the range, no rows whose StuffProdID value starts with that letter are included. They are not part of the acceptable range of values, based on the sort order defined by the collation. In other words, all values that start with c include other characters, so they’re considered out of range.

To help clarify this, let’s sort our table based on the StuffProdID column:

The data in the StuffProdID column is sorted based on the rules of the database collation:

StuffID

StuffProdID

StuffDate

1

ab38

2014-07-06 12:46:48.240

2

ab39

2014-07-08 10:46:48.240

7

ab398

2014-07-13 22:46:48.240

8

ab399

2014-07-14 20:46:48.240

9

ab400

2014-07-15 18:46:48.240

3

ab48

2014-07-09 08:46:48.240

4

ab49

2014-07-10 06:46:48.240

5

ab58

2014-07-11 04:46:48.240

6

ab59

2014-07-12 02:46:48.240

10

cd38

2014-07-16 16:46:48.240

11

cd39

2014-07-17 14:46:48.240

16

cd398

2014-07-22 13:46:48.240

17

cd399

2014-07-23 14:46:48.240

18

cd400

2014-07-24 15:46:48.240

12

cd48

2014-07-18 12:46:48.240

13

cd49

2014-07-19 10:46:48.240

14

cd58

2014-07-20 11:46:48.240

15

cd59

2014-07-21 12:46:48.240

The data is sorted first by the first character, then by the second character, within the first-character grouping, next by the third character, within the second-character grouping, and so on. If we reconsider our BETWEEN clause (BETWEEN 'a' AND 'c'), we’ll see that the range includes all the values that start with a, but none that start with c. If we had a c-only value, that would be included, but in this case, all of our values that start with c include other characters as well and consequently are outside the range specified in the BETWEEN clause. Now let’s specify a more inclusive range:

Because our range includes all rows up to cd4, our resultset includes several more rows then it did when we specified only c as the upper end of the range:

StuffID

StuffProdID

StuffDate

1

ab38

2014-07-06 12:46:48.240

2

ab39

2014-07-08 10:46:48.240

3

ab48

2014-07-09 08:46:48.240

4

ab49

2014-07-10 06:46:48.240

5

ab58

2014-07-11 04:46:48.240

6

ab59

2014-07-12 02:46:48.240

7

ab398

2014-07-13 22:46:48.240

8

ab399

2014-07-14 20:46:48.240

9

ab400

2014-07-15 18:46:48.240

10

cd38

2014-07-16 16:46:48.240

11

cd39

2014-07-17 14:46:48.240

16

cd398

2014-07-22 13:46:48.240

17

cd399

2014-07-23 14:46:48.240

Note, however, we’re still missing any rows that start with cd4 or cd5 because they fall outside the specified range. If you refer back to the fully sorted resultset, you’ll see that the values cd400 through cd59 are at the end of the result set, after cd4.

The following example demonstrates this concept further:

Now our results include only five rows, those that fall within the specified range. As you can see in the following results, ab400 is now included because we specified the entire value, not just the first part of it:

StuffID

StuffProdID

StuffDate

1

ab38

2014-07-06 12:46:48.240

2

ab39

2014-07-08 10:46:48.240

7

ab398

2014-07-13 22:46:48.240

8

ab399

2014-07-14 20:46:48.240

9

ab400

2014-07-15 18:46:48.240

If you’re working with DATETIME values, you also need to be aware of how the data is sorted and compared. The following example specifies a date range three days apart:

Because the range specified in a BETWEEN clause is inclusive, we might expect four rows to be returned, but instead we get only three:

StuffID

StuffProdID

StuffDate

4

ab49

2014-07-10 06:46:48.240

5

ab58

2014-07-11 04:46:48.240

6

ab59

2014-07-12 02:46:48.240

When working with DATETIME values, keep in mind that the value is always made up of the date and time, down to a thousandth of a second. The time part is factored into the range calculations, so you must take that part into account when you’re specifying your range. If you don’t specify a specific time, as is the case in our SELECT statement, SQL Server assumes the time is midnight-all zeroes.

One approach to dealing with this issue is to specify the entire DATETIME values:

This way, our results will include all rows whose DATETIME values fall within this very specific range:

StuffID

StuffProdID

StuffDate

4

ab49

2014-07-10 06:46:48.240

5

ab58

2014-07-11 04:46:48.240

6

ab59

2014-07-12 02:46:48.240

7

ab398

2014-07-13 22:46:48.240

The trouble with this approach, however, is that it can get fairly cumbersome. Instead of using the BETWEEN operator, you might consider the greater than or equal to operator (>=) along with the lesser than operator (<):

For the beginning of the range, we specify that the date be greater than or equal to the specified date, which puts us right at midnight. We then use the lesser than operator to specify one day past our intended date. Because it will also be set to midnight, it will pick up anything before midnight on the previous day. The following table shows the results now returned by the SELECT statement:

StuffID

StuffProdID

StuffDate

4

ab49

2014-07-10 06:46:48.240

5

ab58

2014-07-11 04:46:48.240

6

ab59

2014-07-12 02:46:48.240

7

ab398

2014-07-13 22:46:48.240

“I’ve seen some T-SQL code that contains ‘national character varying(max)‘ listed as that data type, rather than one of the normal types, yet everything seems to work fine. Is this something new?”

There’s nothing new about national character data types such as NATIONAL CHARACTER VARYING. SQL Server includes them as synonyms to appease the ISO gods in order to conform to the international standards set by that organization. In fact, many of the SQL Server data types include ISO synonyms. (The TechNet topic “Data Type Synonyms” lists all the ISO synonyms.)

You can use the synonyms in place of the names of the base data types in your data definition language (DDL) statements. However, once you create the object, the synonyms are no longer used. For example, the following T-SQL script creates a table that includes columns configured with the NVARCHAR and NATIONAL CHARACTER VARYING data types:

 We can specify the national character type just like we do any other type. Plus, we can insert and manipulate data just like any other type. But now let’s look at the column metadata:

All we’re doing is pulling the column names and their data types, after we created our table. The following results tell all:

ColumnName

TypeName

StuffID

int

StuffName1

nvarchar

Stuffname2

nvarchar

As you can see, both character columns have been created with the NVARCHAR data type. The NATIONAL CHARACTER VARYING alias has disappeared altogether.

“When is it best to use the CHAR date type rather than VARCHAR?”

The key to knowing when to use one over the other is to first understand the differences between them. The CHAR and NCHAR data types store data at a fixed length. If an inserted value is less then that length, it is padded with trailing spaces to ensure that every value is of equal length. The VARCHAR and NVARCHAR data types are variable length, which means the size depends on the inserted value, plus an additional couple of bytes for overhead.

The following example helps illustrate the differences between the data types:

All we’re doing here is using the DATALENGTH function to determine the size of each variable, as shown in the following results:

char1

nchar2

varchar3

nvarchar4

7

14

5

10

Although the value is only five characters long, the length of the CHAR value is seven bytes, because that was how the variable was defined. The NCHAR value works the same way, except that it’s doubled to account for the Unicode characters, so the DATALENGTH function returns 14. On the other hand, the VARCHAR value requires only the bytes needed for the value (not including the overhead bytes). The NVARCHAR value merely doubles that.

The choice, then, on whether to use CHAR or VARCHAR depends on how consistent the size of the values are and how small the values are. Microsoft recommends that you use CHAR only when value lengths are consistent, and use VARCHAR when the lengths vary a lot. Microsoft is noncommittal about the gray areas in between.

Small columns (fewer than five or six characters) often work well as CHAR columns when you take into account the overhead added to a VARCHAR column. Even so, some developers would suggest that you use CHAR only if all values are exactly the same length. If values are shorter than the designated length, they get padded with spaces, which can be annoying to deal with and could potentially waste a lot of space if your values are mostly one-to-two characters but you’ve created a CHAR(5) column to accommodate a relatively few five-digit values.

That said, if all your values are a comparable length, you don’t incur the storage penalty for overhead that comes with VARCHAR, although storage has grown so cheap, some of the concerns about the extra space have been mitigated. You might see some performance gains if you go with CHAR, but the significance of those gains can vary and are perhaps not worth the trade-off of having to deal with the trailing spaces. Even so, CHAR is a useful option when your data is consistent and small.

“I’m building a query that joins a FLOAT column to a REAL column, but when I run the statement, it returns no results, even though I can see the matching values in the two columns. Why is this happening?”

Welcome to the shifty world of approximate-number data types, in which floating-point data is merely an approximation and the numbers you see might not be the real numbers at all. Let me demonstrate. The following T-SQL declares a FLOAT variable, sets its value, and retrieves the value as its represented in SQL Server Management Studio (SSMS) as well as it actually exists in the database:

The STR function converts the numerical data to character data and, in the process, gives us a glimpse of what’s really out there:

StuffAvg

RealStuff

33.33

33.329999999999998

So here’s what’s happened. We assigned the value 33.33 to the variable, but the database engine actually stored it as 33.329999999999998, thus the approximation component. When SSMS retrieves that value, however, it sees fit to once again make it look nice and neat (a fact that displeases more than a few in the SQL Server user community).

On the surface, all this might not seem too big a deal-or perhaps it does. In any case, let’s take a closer look. The following T-SQL creates a table that contains a FLOAT column and a REAL column and then populates those columns with a few choice values:

Now let’s do what we did with our variable and retrieve the SSMS-enhanced values and the actual values:

Once again, our trusty STR function comes to the rescue and shows us what’s what:

StuffID

StuffAvg1

RealStuff1

StuffAvg2

RealStuff2

1

33.33

33.329999999999998

33.33

33.330001831054688

2

66.66

66.659999999999997

66.66

66.660003662109375

3

99.99

99.989999999999995

99.99

99.989997863769531

As you can see, not only do the FLOAT and REAL data types like to approximate their values, they also like to do it differently. Yet according to the SSMS perspective, each pair of values is exactly the same. What makes matters worse is that our references to the values also get somewhat fuzzy. Check it out:

When we specify 33.33 in our WHERE clause, the applicable row is returned:

StuffID

StuffAvg1

StuffAvg2

1

33.33

33.33

Now let’s specify 33.329999999999998 in our WHERE clause:

The statement returns the exact results as the previous one. And it works the same way for our REAL column. Either value will return the same row.

By all appearances, then, it would seem that we could specify either value or compare values and everything would work out fine. Guess again. This time around, we’ll compare the values as they appear in SSMS, without the benefit of STR:

It turns out that the statement returns a value of not equal, suggesting that, despite appearances, not all instances of FLOAT and REAL are anything close to equal. So what happens if we try to join a FLOAT column to a REAL column? To find out, let’s create a couple tables and add some data:

Now let’s retrieve the data in the OurStuff table:

Not surprisingly, our results look like this:

StuffID

StuffAvg1

1

33.33

2

66.66

3

99.99

Next, let’s retrieve the data in the OurOtherStuff table:

Once again, no surprises:

StuffID

StuffAvg2

1

33.33

2

66.66

3

99.99

So with everything in place, let’s join these critters:

As it turns out, the statement returns no rows, which is why Microsoft generally recommends that you avoid  using FLOAT or REAL in equality constructions in places such as joins or WHERE conditions. To make this work, you would need to do some rounding or converting or creating calculated columns or something as equally clever. Just don’t rely on FLOAT and REAL for any sort of precision. That’s not what they were designed to do. They offer a great deal of flexibility, but at a price.

“I have a table that includes a VARCHAR column and a stored procedure that accesses data from the table, based on values in that column. The procedure includes an NVARCHAR parameter used to qualify the WHERE clause, which references the VARCHAR column. Does it matter that the data types are different between the parameter and the column?”

Yes, it matters, but let’s make sure we’re talking about the same thing. The following T-SQL script creates a table, inserts data into that table, creates a procedure that retrieves data from that table, and then executes the procedure, passing in the parameter value Really Big Thing:

When you run the procedure, it returns the StuffID and StuffDate values, as the following results show:

StuffID

StuffDate

3

2014-07-06 17:16:29.457

ou might have noticed that the table’s StuffName column is configured as VARCHAR and the procedure’s @stuff parameter is configured as NVARCHAR. The database engine implicitly converts the value without no apparent problem. In the background, however, we might find a different story. Although the database engine can perform an implicit conversion without a hiccup, the operation can result in a performance penalty, in part because it is preventing index seeks. As a result, when you start adding millions-or billions-of rows to the equation, a minor data type mismatch can have a significant impact. For more information about all this, check out my Simple Talk article “Data Conversion in SQL Server.”

“My query needs to calculate past and future dates based on an existing date as well as calculate the time difference between two dates. I’ve tried using arithmetic operators to perform the calculations, but the results are never right. How do I go about working with dates?”

The best two tools for achieving what you’re trying to do are the DATEADD and DATEDIFF functions. DATEADD subtracts or adds a specified interval of time to a date value, which can be any date or time data type. DATEDIFF returns the interval of time between two specified date values. Again, the values can be any of the date or time date types.

Let’s first look at how the DATEADD function works. The following T-SQL declares a DATETIME variable, uses the GETDATE function to assign the current date and time values to the function, and then retrieves the dates 10 years forward and 10 years back:

The DATEADD function takes three arguments. The first argument is the date part, such as day, week, month, or year. In this case, we’ve specified yy for year. The second argument is the number of years forward or backward we want to go. Finally, we specify our base date. The SELECT statement returns the following results:

StuffDate

Plus10

Less10

2014-07-07 19:03:50.183

2024-07-07 19:03:50.183

2004-07-07 19:03:50.183

If we want to extract a specific date or time element, we can use the DATEPART function, as in the following example:

All we’ve done is specify that only the year should be extracted from the date, as shown in the following results:

StuffDate

Plus10

Less10

2014

2024

2004

Now let’s use the DATEDIFF function to calculate the number of months between two dates:

The DATEDIFF function also requires three arguments. As with DATEADD, the first argument is the date part. This time around we’re using mm for month. The next two arguments are the dates we want to compare, in this case, our two DATETIME variables. As the following results show, there are 120 months between the two dates:

StuffDate1

StuffDate2

MonthDiff

2014

2024

120

The functions available in SQL Server for working with date and time values are definitely ones worth getting to know. Date and time values are treated differently from other types of data and require special consideration. You can find more about these functions in the MSDN topic “Date and Time Data Types and Functions.”

“I’m told never to use SQL_VARIANT data type inappropriately. What is an appropriate use for this data type?”

SQL_VARIANT is included in SQL Server mainly for system use. Although it is used within system procedures and views, It has several important  restrictions. It can only be used with great caution, and there is usually a better way. It cannot be used in a relational table.  However, there do exist some appropriate specialised uses.

An appropriate use is one in which you have no other reasonable option or a reasonable option requires a workaround that causes more trouble than using SQL_VARIANT. In fact, under the right circumstances, the data type could prove a valuable tool. It supports most T-SQL base types all within a single structure, such as a column or parameter. For example, if you define a column with SQL_VARIANT, you can insert character data and numerical data and data/time data. Here’s proof. The following T-SQL script creates a table that includes a SQL_VARIANT column and then inserts into it an assortment of values:

Sure, we can insert the same data into a VARCHAR column, but the values would all come out VARCHAR data. The SQL_VARIANT data type, on the other hand, preserves the original type, which we can verify by using the SQL_VARIANT_PROPERTY function:

And here are the results:

StuffName

BaseType

BaseLength

BasePrecision

Same Old Thing

varchar

8000

0

479

int

4

10

2014-07-08 09:06:34.440

datetime

8

23

1000000000

float

8

53

453.66903438

numeric

9

11

Notice that each type has been preserved. SQL_VARIANT was even able to pick up on the scientific notation. So you can see why, in certain circumstances, such a data type could be useful. For example, you might consider SQL_VARIANT when defining a parameter in a stored procedure or function. This approach helps to maintain some of the precision you might otherwise lose when passing in numeric or date/time values through a VARCHAR parameter. Another example of how SQL_VARIANT might be useful is if you’re supporting an application that allows users to define their own tables and fields. SQL_VARIANT can help you avoid having to set up extra structures to support each data type.

You’ll have to decide on a case-by-case basis whether to use SQL_VARIANT, but know that the data type comes with a number of limitations. For instance, SQL Server converts SQL_VARIANT values to NVARCHAR(4000) to support older versions of the OLE DB and ODBC providers. Same thing with many other external tools that don’t recognize SQL_VARIANT. In addition, using linked servers in conjunction with SQL_VARIANT can lead to potential memory leaks. Plus, you can’t use the data type in statement elements such as LIKE expressions or computed columns. So before you implement the SQL_VARIANT data type, do your homework. A good place to start with the TechNet topic “Using sql_variant Data.