14 July 2014

49526 views

69 0

Questions about SQL Server Data Types You were Too Shy to Ask

Although SQL Data Types seem to cause a lot of grief for database developers and can be tricky in their use, we seem to be expected to know all about them, and so it is embarrassing to ask questions about them in forums. Rob Sheldon continues in his mission to answer all those questions that we hesitate to ask.

The Questions

“Under what circumstances is character data truncated without triggering an error?”.
“SQL Server’s implicit conversions sometimes lead to unexpected query results, such as a value being returned as a data type different from what I expect. How do implicit conversions work?”.
“I’ve created a query that uses the ISNUMERIC function to verify whether a value is numeric and, if so, convert it to an integer. In some cases, however, the database engine generates an error because it’s trying to convert a value that’s not really a numeric. What’s going on?”
“I’m told I should use the BIT data type where possible to save space. Is it true that it saves space or is this just syntactic sugar?”
“When I use the DATALENGTH and LEN functions to calculate the size of a string, I often receive different results, even though I’d expect them to be the same. What’s the difference between the two functions?”.
“I’m troubleshooting a stored procedure that uses a COALESCE expression to return the first non-NULL value from a set of columns. However the procedure often generates a conversion error, even if the expression returns is the first column listed. Any idea why this might be occurring?”.
“I’m developing a query that contains a BETWEEN operator in the WHERE clause. The compared data includes a range of consecutive values that contain both letters and numbers, as in XYZ60 through XYZ700. Even when I specify the entire range, values are unexpectedly omitted from my query results. Any idea why this is occurring?”
“I’ve seen some T-SQL code that contains ‘national character varying(max)’ listed as that data type, rather than one of the normal types, yet everything seems to work fine. Is this something new?”.
“When is it best to use the CHAR date type rather than VARCHAR?”.
“I’m building a query that joins a FLOAT column to a REAL column, but when I run the statement, it returns no results, even though I can see the matching values in the two columns. Why is this happening?”.
“I have a table that includes a VARCHAR column and a stored procedure that accesses data from the table, based on values in that column. The procedure includes an NVARCHAR parameter used to qualify the WHERE clause, which references the VARCHAR column. Does it matter that the data types are different between the parameter and the column?”.
“My query needs to calculate past and future dates based on an existing date as well as calculate the time difference between two dates. I’ve tried using arithmetic operators to perform the calculations, but the results are never right. How do I go about working with dates?”.
“I’m told never to use SQL_VARIANT data type inappropriately. What is an appropriate use for this data type?”

“Under what circumstances is character data truncated without triggering an error?”

Never, one might hope. But SQL Server is a sneaky devil, preventing truncation some of the time, but not all the time. Sure, if you try to insert too large a value into a column, the database engine baulks, just like it would for the following table:

IF OBJECT_ID('OurStuff', 'U') IS NOT NULL

DROP TABLE OurStuff;

CREATE TABLE OurStuff

(

StuffID INT NOT NULL IDENTITY PRIMARY KEY,

StuffName VARCHAR(10) NOT NULL

);

The StuffID column, the primary key, is configured with the INT data type, and the StuffName column is configured as VARCHAR(10). All very basic. And just as basic is the following INSERT statement:

1 2	INSERT INTO OurStuff(StuffName) VALUES('Really Big Thing');

What we’re trying to do is here is insert a value into the StuffName column that exceeds the data type’s specified length. Not surprisingly, SQL Server kicks out an error:

Msg 8152, Level 16, State 14, Line 1

String or binary data would be truncated.

The statement has been terminated.

Everything just as we’d expect. The value being inserted is too big, so the database engine jumps to the rescue, preventing any values from being truncating and, consequently, ensuring integrity of the data.

Unfortunately, SQL Server is not quite so protective with its variables and parameters. Consider the following T-SQL script, which declares the @stuff variable and sets its value to a string that exceeds the data type’s specified length:

1 2	DECLARE @stuff VARCHAR(10) = 'Really Big Thing'; PRINT @stuff;

The database engine doesn’t blink. It simply inserts the string’s first 10 characters (Really Big) and goes about its business, pretending that all is well, when in fact our variable must limp along with a truncated value. The PRINT statement merely confirms our suspicions and returns the really wrong Really Big.

It gets worse. Imagine you’re using an ISNULL function to compare two variables, as in the following example:

DECLARE

@stuff1 VARCHAR(5) = NULL,

@stuff2 VARCHAR(10) = 'Big Thing';

PRINT ISNULL(@stuff1, @stuff2);

The first variable is defined as VARCHAR(5) and assigned a value of NULL. The second is defined as VARCHAR(10) and assigned the string value Big Thing. So far so good. Now we get to ISNULL. The thing with this function is that the first expression (in this case, @stuff1) determines the outputted data type, even if that expression returns a NULL. That means the @stuff2 value is confined by @stuff1 data type constraints. As a result, our PRINT statement, rather than returning Big Thing, gives us Big T, which sounds more like a mob boss than a credible data item.

Parameters too can fall victim to truncation without warning. Suppose we create the following stored procedure:

IF OBJECT_ID('AddStuff', 'P') IS NOT NULL

DROP PROCEDURE AddStuff;

CREATE PROCEDURE AddStuff

@stuff VARCHAR(10)

INSERT INTO OurStuff(StuffName) VALUES(@stuff);

A simple example, of course, but it demonstrates an important concept. Notice that the @stuff parameter is defined as VARCHAR(10). Now suppose we pass in a value longer than 10 characters:

1	EXEC AddStuff 'Really Big Thing';

The database engine runs the procedure without a hitch and returns a message saying that one row is affected (the row inserted into the OurStuff table). We can verify this by running a simple SELECT statement:

1	SELECT * FROM OurStuff;

The results, shown in the following table, indicate that the value has indeed been truncated, although not once did SQL Server raise an exception.

StuffID	StuffName
2	Really Big

So be aware of what can happen when working with variables and parameters. You’ll likely want to include in your T-SQL code the logic necessary to catch values that might be faulty. Obviously, you can’t rely on the database engine to do it for you.

“SQL Server’s implicit conversions sometimes lead to unexpected query results, such as a value being returned as a data type different from what I expect. How do implicit conversions work?”

SQL Server is great at surreptitiously converting values from one type to another: CHAR to VARCHAR, FLOAT to NCHAR, BIGINT to MONEY. You get the picture. As long as the values are compatible with the target type, the database engine does all the work, and you get to kick back and reap the rewards.

Sort of.

Imagine the following scenario, in which we add a CHAR variable to an INT variable:

DECLARE

@stuff1 CHAR(2) = '22',

@stuff2 INT = 33;

SELECT @stuff1 + @stuff2 AS AllStuff,

SQL_VARIANT_PROPERTY(@stuff1 + @stuff2, 'basetype') AS BaseType;

We declare the variables, assign values, and add them together. We also pull the base type of the returned value, just so we know what’s going on. The following table shows us the results of our SELECT statement:

AllStuff	BaseType
55	int

All looks fine on the surface. We add our INT value to our CHAR value and come up with the total (55), returned as an INT value. The database engine has implicitly converted the CHAR value to an INT value. The INT data type is used because INT takes precedence over CHAR. SQL Server is quite explicit when it comes to data type precedence. INT wins out over CHAR and VARCHAR every time. If a DATETIME value is tossed into the mix, it will beat out INT and just about all other types. (You can find details about precedence rules in the MSDN topic “Data Type Precedence.”)

Returning to our example, SQL Server converts the CHAR value to INT and then proceeds to add the two integers together, giving us a result of 55. But what if we didn’t want to add the values together but concatenate them instead? In that case, we’d have to explicitly convert the INT value:

DECLARE

@stuff1 CHAR(2) = '22',

@stuff2 INT = 33;

SELECT @stuff1 + CAST(@stuff2 AS CHAR(2)) AS AllStuff,

SQL_VARIANT_PROPERTY(@stuff1 + CAST(@stuff2 AS CHAR(2)),

'basetype') AS BaseType;

Now our results look quite different. Not only are the two values concatenated, but the value is also returned as a CHAR type:

AllStuff	BaseType
2233	char

You should also be aware of how SQL Server handles implicit conversions for numerical data. For example, suppose this time around, we create a DECIMAL variable and an INT variable, assign a value to the DECIMAL variable, and then set the INT variable to equal the DECIMAL variable:

DECLARE

@stuff1 DECIMAL(8,4) = 1234.9999,

@stuff2 INT;

SET @stuff2 = @stuff1;

SELECT @stuff1 AS Stuff1, @stuff2 AS Stuff2;

The SELECT statement now returns these results:

Stuff1	Stuff2
1234.9999	1234

Notice what the database engine has done. Rather than rounding the value up, as might be expected, it simply truncates the value, giving us our integer, without too great a concern for the original value. And the problem isn’t limited to DECIMAL–INT conversions. Check out what happens when we go from INT to REAL and back again:

DECLARE

@stuff1 INT = 1080000000,

@stuff2 INT = 1080000016,

@stuff3 REAL;

SET @stuff3 = @stuff2;

SELECT

@stuff3 AS Stuff3,

CAST(@stuff3 AS INT) AS Stuff3int,

CASE

WHEN @stuff3 = @stuff1

THEN 'yes'

ELSE 'no'

END AS IsEqual;

We declare the INT variables and set their values, and then declare a REAL variable and set its value to equal the second INT variable. In our SELECT statement, we then retrieve the REAL variable as it is stored, convert it to an INT, and then check whether the first and third variables are equal. The following table shows our results:

Stuff3	Stuff3in	IsEqual
1.08E+09	1080000000	yes

Notice that the Stuff3 value appears as scientific notation because the @stuff3 variable had insufficient precision to hold the original integer. In addition, when the variable is converted to an integer, it returns a value different from what it was assigned. Instead, it now equals the value stored in @stuff1.

In many cases, you shouldn’t leave it up to the database engine to do your conversions, no matter how tempting that might be. In fact, some developers insist that all conversions should be explicit. At the very least, be sure that whenever the integrity of the data could be brought into question, error on the side of converting the data explicitly.

“I’ve created a query that uses the `ISNUMERIC` function to verify whether a value is numeric and, if so, convert it to an integer. In some cases, however, the database engine generates an error because it’s trying to convert a value that’s not really a numeric. What’s going on?”

The ISNUMERIC function can be a slippery one. On the surface, its purpose appears only to determine whether the submitted value is indeed a valid numeric type. But looks can be deceiving. Take the following T-SQL script. It creates a table with a VARCHAR column and then adds a mix of values to that column:

IF OBJECT_ID('OurStuff', 'U') IS NOT NULL

DROP TABLE OurStuff;

CREATE TABLE OurStuff

(

StuffID INT NOT NULL IDENTITY PRIMARY KEY,

StuffName VARCHAR(10) NOT NULL

);

INSERT INTO OurStuff(StuffName)

VALUES('abc'), ('def123'), ('456'), ('$789'), ('1d1'), (','), ('.'), ('$.,');

Nothing extraordinary, just your run-of-the-mill alphanumeric hodgepodge. But now let’s look at what happens when we apply the ISNUMERIC function to the StuffName column:

1 2	SELECT StuffID, StuffName, ISNUMERIC(StuffName) AS IsNumber FROM OurStuff;

All we’re doing is pulling the table’s two columns, as well as returning a third column to test each value’s numeric status. The ISNUMERIC function returns a 1 if the tested value appears to be a numeric data type, otherwise returns a 0. Here’s what we get:

StuffID	StuffName	IsNumber
1	abc	0
2	def123	0
3	456	1
4	$789	1
5	1d1	1
6	,	1
7	.	1
8	$.,	1

It should come as no surprise that the first two values are considered non-numeric and the third value is. However, the function also considers $789 to be numeric, despite the dollar sign (or more precisely, because of the dollar sign), and 1d1 to be numeric because of its resemblance to scientific notation, whether or not that is the intent.

The sky might not fall because of these interpretations, but they could cause problems if you want to perform other operations based on the output of the ISNUMERIC function. For example, suppose we want to explicitly convert each value to an INT based on whether we think that value is numeric:

SELECT StuffID, StuffName,

CASE

WHEN ISNUMERIC(StuffName) = 1

THEN CAST(StuffName AS INT)

ELSE NULL

END AS CaseResult

FROM OurStuff

WHERE StuffID BETWEEN 1 AND 3;

In this case, we’re retrieving only the first three rows from our table. If the ISNUMERIC function returns a 1, we convert the value to the INT type; otherwise, we return a NULL value. As to be expected, the first two rows each return a NULL and the third row returns the value 456 as an INT, as shown in the following results:

StuffID	StuffName	CaseResult
1	abc	NULL
2	def123	NULL
3	456	456

Now let’s retrieve the fourth row from the table:

SELECT StuffID, StuffName,

CASE

WHEN ISNUMERIC(StuffName) = 1

THEN CAST(StuffName AS INT)

ELSE NULL

END AS CaseResult

FROM OurStuff

WHERE StuffID = 4;

This time around, we receive the following error message, indicating that we cannot convert that value to an INT:

1 2	Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '$789' to data type int.

This shouldn’t come as too big of a surprise. The INT data type has never been fond of dollar signs. The same goes for scientific notation. If we were to try to run the query against the last row in our table, we would receive a similar error.

And it’s not just dollar signs and possible scientific notation that can elicit problems. The ISNUMERIC function recognizes an assortment of characters as being numeric (in addition to actual numbers), such as plus signs and minus signs and periods and tabs and commas. Take a look at the following example, in which we assign a tab to the @tab variable:

1 2	DECLARE @tab VARCHAR(10) = ' '; PRINT ISNUMERIC(@tab);

As odd as this might seem, the ISNUMERIC function returns a 1, as it will in the next example, in which we pass in a plus sign:

1 2	DECLARE @plus NVARCHAR(10) = '+'; PRINT ISNUMERIC(@plus);

If you’re working with a version of SQL Server prior to 2012, you’ll have to come up with some sort of workaround to check for those circumstances in which the apparent numeric value would generate an error. You might, for example, create a function or add a CASE statement to setup complex test conditions.

For those of you working with SQL Server 2012 or later, you have a better alternative-the TRY_CONVERT function:

SELECT StuffID, StuffName,

CASE

WHEN ISNUMERIC(StuffName) = 1

THEN TRY_CONVERT(INT, StuffName)

ELSE NULL

END AS CaseResult

FROM OurStuff;

The function first tests whether the requested conversion will work. If so, it converts the data to the target data type. If not, it returns a NULL:

StuffID	StuffName	CaseResult
1	abc	NULL
2	def123	NULL
3	456	456
4	$789	NULL
5	1d1	NULL
5	,	NULL
5	.	NULL
5	$.,	NULL

As you can see, the only number that can be converted is 456. The rest are there only to complicate the lives of overworked DBAs with nothing better to do than to search for wayward numbers.

“I’m told I should use the `BIT` data type where possible to save space. Is it true that it saves space or is this just syntactic sugar?”

Yes, the BIT data type can save space under certain circumstances, and it undoubtedly has some advantages over an approach such as using CHAR(1), which requires a CHECK constraint to limit the values in the same way BIT already does. With the BIT data type, you can insert only the values 1 and 0, as well as NULL values, assuming the object is nullable.

That said, a BIT data type still uses a byte of data, just like CHAR(1) or TINYINT. Some might have assumed that the BIT type translates into only one bit of data, given the type’s tie-in to the 1 and 0 value limits. But that’s hardly the case. BIT is actually in integer data type and consequently requires more legroom. Case in point:

DECLARE @stuffed BIT = 0;

SELECT

@stuffed AS Stuffed,

DATALENGTH(@stuffed) AS BitLength;

As you can see in the following results, the value does indeed use one byte:

Stuffed	BitLength
0	1

However, if you create a table with multiple BIT columns, the database engine is clever enough to store the column data in a single byte, up to eight columns. If your table contains between 9 and 16 BIT columns, SQL Server stores the data as two bytes. Between 17 and 24 BIT columns? Three bytes, and so on. You get the picture.

That said, you should still be wary about implementing BIT. It can have unexpected results when used in an expression, and it’s not necessarily well suited to masquerading as a bit flag or bit mask. For an interesting take on the whole issue, check out Joe Celko’s excellent article “BIT of a Problem.”

“When I use the `DATALENGTH` and `LEN` functions to calculate the size of a string, I often receive different results, even though I’d expect them to be the same. What’s the difference between the two functions?”

The DATALENGTH function returns the number of bytes used by an expression, and the LEN function returns the number of characters contained in an expression. The LEN function is also limited to string expressions; whereas, the DATALENGTH function supports expressions of any type.

To better understand how they both work, let’s take a look at them in action. The following T-SQL script creates a table with four columns, each configured with a character data type, and then inserts data into those columns:

IF OBJECT_ID('TestLength', 'U') IS NOT NULL

DROP TABLE TestLength;

CREATE TABLE TestLength

(

char1 CHAR(10),

nchar2 NCHAR(10),

varchar3 VARCHAR(10),

nvarchar4 NVARCHAR(10)

);

INSERT INTO TestLength

VALUES('stuff', 'stuff', 'stuff', 'stuff');

INSERT INTO TestLength

VALUES('stuff ', 'stuff ', 'stuff ', 'stuff ');

Notice that, in the final INSERT statement, I’ve added several spaces after each value. You’ll see why in a moment. But first, let’s retrieve data from each of the columns, applying both the LEN and DATALENGTH functions:

SELECT

LEN(char1) AS len1,

DATALENGTH(char1) AS datalength1,

LEN(nchar2) AS len2,

DATALENGTH(nchar2) AS datalength2,

LEN(varchar3) AS len3,

DATALENGTH(varchar3) AS datalength3,

LEN(nvarchar4) AS len4,

DATALENGTH(nvarchar4) AS datalength4

FROM TestLength;

The following table shows the statement’s output:

len1	datalength1	len2	datalength2	len3	datalength3	len4	datalength4
5	10	5	20	5	5	5	10
5	10	5	20	5	8	5	16

The len1 and datalength1 columns in the resultset refer to the char1 column in the TestLength table. The column is configured as CHAR(10). The LEN function provides of the actual number of characters being used (5), not including the trailing spaces. The DATALENGTH function indicates that 10 bytes are being used, despite the actual number of characters. This is because the column is configured at a fixed width of 10. As a result, the DATALENGTH function can provide us with a more accurate view of our storage requirements.

The len2 and datalength2 columns in the resultset refer to the nchar2 column and work much the same way, except that it is a Unicode column configured as NCHAR(10). The LEN function still shows that five characters are being used (not including spaces), but the DATALENGTH function indicates that the value actually requires 20 bytes, just like we’d expect.

When we get to the len3 and datalength3 columns, the results get even more interesting. This data is based on the varchar3 column, which is configured as VARCHAR(10). Even so, the LEN function still indicates that only five characters are being used. However, because we’re working with a VARCHAR column, the DATALENGTH function indicates that five bytes are being used by the value in the first row and eight bytes by the value in the second row (to accommodate the trailing spaces). The len4 and datalength4 columns work in the same way, only the number of bytes is doubled in each case to account for the Unicode type.

As mentioned earlier, although the LEN function is limited to string data, you can use the DATALENGTH function on an expression of any data type. For example, the following T-SQL declares an INT variable, assigns a value, and then checks the number of bytes the variable uses:

1 2	DECLARE @int INT = 1234567; SELECT @int AS IntNew, DATALENGTH(@int) AS IntLength;

The SELECT statement returns the results shown in the following table, which indicate that the value uses four bytes, the expected length for the INT data type:

IntNew	IntLength
1234567	4

Even if our integer were made up of only two digits, the results would be the same. Same goes for a DATETIME value, in terms of consistency:

1 2	DECLARE @now DATETIME = GETDATE(); SELECT @now AS DateNow, DATALENGTH(@now) AS DateLength;

In this case, the DATALENGTH function returns eight bytes:

DateNow	DateLength
2014-07-05 21:32:23.507	8

We would expect eight bytes because DATETIME values are stored as two integers, one for the date and one for the time.

“I’m troubleshooting a stored procedure that uses a `COALESCE` expression to return the first non-NULL value from a set of columns. However the procedure often generates a conversion error, even if the expression returns is the first column listed. Any idea why this might be occurring?”

A COALESCE expression can be a bit tricky to get right at first. It is essentially shorthand for a type of CASE expression that evaluates a series of expressions, often columns or variables, and returns the first one that does not evaluate to NULL. Let’s look how it works. The following T-SQL declares several variables and assigns values to them, based on their respective types:

DECLARE

@stuffNull VARCHAR(5) = NULL,

@stuffID INT = 1001,

@stuffName VARCHAR(10) = 'Big Thing';

SELECT

COALESCE(@StuffNull, @StuffID, @StuffName) AS NotNullStuff,

SQL_VARIANT_PROPERTY(COALESCE(@StuffNull, @StuffID, @StuffName), 'basetype') AS BaseType;

The COALESCE expression takes the three variables as an argument. Because the @StuffNull variable contains a NULL value, it will be skipped and we’ll move to the next variable, @StuffID. This one contains a value, so the SELECT statement returns that value in its results:

NotNullStuff	BaseType
1001	int

Everything seems in order here. The @StuffID variable does not contain a NULL, so that value is returned, and it is indeed an integer, just as we would expect. But what if we change the order of our variables:

DECLARE

@stuffNull VARCHAR(5) = NULL,

@stuffID INT = 1001,

@stuffName VARCHAR(10) = 'Big Thing';

SELECT

COALESCE(@StuffNull, @StuffName, @StuffID) AS NotNullStuff,

SQL_VARIANT_PROPERTY(COALESCE(@StuffNull, @StuffID, @StuffName), 'basetype') AS BaseType;

The @StuffName variable does not contain a NULL value, so that variable’s value should be returned. However, the statement instead generates the following error:

1 2	Msg 245, Level 16, State 1, Line 5 Conversion failed when converting the varchar value 'Big Thing' to data type int.

The problem is that a COALESCE expression uses the data type with the highest precedence. It does not matter what order the arguments are presented. Because INT has a higher precedence than VARCHAR, the returned type is an INT, which means the database engine is trying to convert the Big Thing string to an integer. It just doesn’t work.

Data type precedence also applies to a type’s length:

DECLARE

@stuffNull VARCHAR(5) = NULL,

@stuffName1 VARCHAR(10) = 'Big Thing',

@stuffName2 VARCHAR(15) = 'Next Big Thing';

SELECT

COALESCE(@StuffNull, @StuffName1, @StuffName2) AS NotNullStuff,

SQL_VARIANT_PROPERTY(COALESCE(@StuffNull, @StuffName1, @StuffName2), 'basetype') AS BaseType,

SQL_VARIANT_PROPERTY(COALESCE(@StuffNull, @StuffName1, @StuffName2), 'maxlength') AS BaseLength;

Because @StuffName1 is the first variable to contain a non-null value, that value is returned. However, because @StuffName2 has a greater length (15 as opposed to 10), that is the data type of the returned value, as the following table shows:

NotNullStuff	BaseType	BaseLength
Big Thing	varchar	15

Although both variables are VARCHAR, the 15 length has precedence over the 10 length, just like INT has precedence over a VARCHAR.

“I’m developing a query that contains a `BETWEEN` operator in the `WHERE` clause. The compared data includes a range of consecutive values that contain both letters and numbers, as in `XYZ60` through `XYZ700`. Even when I specify the entire range, values are unexpectedly omitted from my query results. Any idea why this is occurring?”

When you use the BETWEEN operator for non-numerical data, you have to keep in mind how SQL Server sorts and searches for data. The example data you provide is considered character data (non-numerical), so the returned range of values follow the rules of the database’s configured collation, which specifies how character data is sorted and compared, based on language and usage norms.

The best way to understand this is to see it in action. The following T-SQL creates the OurStuff table, which includes a VARCHAR column and DATEIME column, and inserts a number of rows into those columns:

IF OBJECT_ID('OurStuff', 'U') IS NOT NULL

DROP TABLE OurStuff;

CREATE TABLE OurStuff

(

StuffID INT NOT NULL IDENTITY PRIMARY KEY,

StuffProdID VARCHAR(10) NOT NULL,

StuffDate DATETIME NOT NULL

);

INSERT INTO OurStuff(StuffProdID, StuffDate) VALUES

('ab38', '2014-07-06 12:46:48.240'),

('ab39', '2014-07-08 10:46:48.240'),

('ab48', '2014-07-09 08:46:48.240'),

('ab49', '2014-07-10 06:46:48.240'),

('ab58', '2014-07-11 04:46:48.240'),

('ab59', '2014-07-12 02:46:48.240'),

('ab398', '2014-07-13 22:46:48.240'),

('ab399', '2014-07-14 20:46:48.240'),

('ab400', '2014-07-15 18:46:48.240'),

('cd38', '2014-07-16 16:46:48.240'),

('cd39', '2014-07-17 14:46:48.240'),

('cd48', '2014-07-18 12:46:48.240'),

('cd49', '2014-07-19 10:46:48.240'),

('cd58', '2014-07-20 11:46:48.240'),

('cd59', '2014-07-21 12:46:48.240'),

('cd398', '2014-07-22 13:46:48.240'),

('cd399', '2014-07-23 14:46:48.240'),

('cd400', '2014-07-24 15:46:48.240');

Now suppose we issue a simple query that retrieves the rows of data based on StuffID values 6 through 16:

1 2	SELECT * FROM OurStuff WHERE StuffID BETWEEN 6 AND 16;

Not surprisingly, the statement returns the results shown in the following table:

StuffID	StuffProdID	StuffDate
6	ab59	2014-07-12 02:46:48.240
7	ab398	2014-07-13 22:46:48.240
8	ab399	2014-07-14 20:46:48.240
9	ab400	2014-07-15 18:46:48.240
10	cd38	2014-07-16 16:46:48.240
11	cd39	2014-07-17 14:46:48.240
12	cd48	2014-07-18 12:46:48.240
13	cd49	2014-07-19 10:46:48.240
14	cd58	2014-07-20 11:46:48.240
15	cd59	2014-07-21 12:46:48.240
16	cd398	2014-07-22 13:46:48.240

he BETWEEN operator does exactly what we expect. It includes rows 6 through 16 and omits the rest. However, now suppose we modify the BETWEEN clause to retrieve data based on the StuffProdID column:

1 2	SELECT * FROM OurStuff WHERE StuffProdID BETWEEN 'a' AND 'c';

Now our results include only the first nine rows:

StuffID	StuffProdID	StuffDate
1	ab38	2014-07-06 12:46:48.240
2	ab39	2014-07-08 10:46:48.240
3	ab48	2014-07-09 08:46:48.240
4	ab49	2014-07-10 06:46:48.240
5	ab58	2014-07-11 04:46:48.240
6	ab59	2014-07-12 02:46:48.240
7	ab398	2014-07-13 22:46:48.240
8	ab399	2014-07-14 20:46:48.240
9	ab400	2014-07-15 18:46:48.240

Although we’ve specified that the letter c as the end of the range, no rows whose StuffProdID value starts with that letter are included. They are not part of the acceptable range of values, based on the sort order defined by the collation. In other words, all values that start with c include other characters, so they’re considered out of range.

To help clarify this, let’s sort our table based on the StuffProdID column:

1 2	SELECT * FROM OurStuff ORDER BY StuffProdID;

The data in the StuffProdID column is sorted based on the rules of the database collation:

StuffID	StuffProdID	StuffDate
1	ab38	2014-07-06 12:46:48.240
2	ab39	2014-07-08 10:46:48.240
7	ab398	2014-07-13 22:46:48.240
8	ab399	2014-07-14 20:46:48.240
9	ab400	2014-07-15 18:46:48.240
3	ab48	2014-07-09 08:46:48.240
4	ab49	2014-07-10 06:46:48.240
5	ab58	2014-07-11 04:46:48.240
6	ab59	2014-07-12 02:46:48.240
10	cd38	2014-07-16 16:46:48.240
11	cd39	2014-07-17 14:46:48.240
16	cd398	2014-07-22 13:46:48.240
17	cd399	2014-07-23 14:46:48.240
18	cd400	2014-07-24 15:46:48.240
12	cd48	2014-07-18 12:46:48.240
13	cd49	2014-07-19 10:46:48.240
14	cd58	2014-07-20 11:46:48.240
15	cd59	2014-07-21 12:46:48.240

The data is sorted first by the first character, then by the second character, within the first-character grouping, next by the third character, within the second-character grouping, and so on. If we reconsider our BETWEEN clause (BETWEEN 'a' AND 'c'), we’ll see that the range includes all the values that start with a, but none that start with c. If we had a c-only value, that would be included, but in this case, all of our values that start with c include other characters as well and consequently are outside the range specified in the BETWEEN clause. Now let’s specify a more inclusive range:

1 2	SELECT * FROM OurStuff WHERE StuffProdID BETWEEN 'ab3' AND 'cd4';

Because our range includes all rows up to cd4, our resultset includes several more rows then it did when we specified only c as the upper end of the range:

StuffID	StuffProdID	StuffDate
1	ab38	2014-07-06 12:46:48.240
2	ab39	2014-07-08 10:46:48.240
3	ab48	2014-07-09 08:46:48.240
4	ab49	2014-07-10 06:46:48.240
5	ab58	2014-07-11 04:46:48.240
6	ab59	2014-07-12 02:46:48.240
7	ab398	2014-07-13 22:46:48.240
8	ab399	2014-07-14 20:46:48.240
9	ab400	2014-07-15 18:46:48.240
10	cd38	2014-07-16 16:46:48.240
11	cd39	2014-07-17 14:46:48.240
16	cd398	2014-07-22 13:46:48.240
17	cd399	2014-07-23 14:46:48.240

Note, however, we’re still missing any rows that start with cd4 or cd5 because they fall outside the specified range. If you refer back to the fully sorted resultset, you’ll see that the values cd400 through cd59 are at the end of the result set, after cd4.

The following example demonstrates this concept further:

1 2	SELECT * FROM OurStuff WHERE StuffProdID BETWEEN 'ab38' AND 'ab400';

Now our results include only five rows, those that fall within the specified range. As you can see in the following results, ab400 is now included because we specified the entire value, not just the first part of it:

StuffID	StuffProdID	StuffDate
1	ab38	2014-07-06 12:46:48.240
2	ab39	2014-07-08 10:46:48.240
7	ab398	2014-07-13 22:46:48.240
8	ab399	2014-07-14 20:46:48.240
9	ab400	2014-07-15 18:46:48.240

If you’re working with DATETIME values, you also need to be aware of how the data is sorted and compared. The following example specifies a date range three days apart:

1 2	SELECT * FROM OurStuff WHERE StuffDate BETWEEN '20140710' AND '20140713';

Because the range specified in a BETWEEN clause is inclusive, we might expect four rows to be returned, but instead we get only three:

StuffID	StuffProdID	StuffDate
4	ab49	2014-07-10 06:46:48.240
5	ab58	2014-07-11 04:46:48.240
6	ab59	2014-07-12 02:46:48.240

When working with DATETIME values, keep in mind that the value is always made up of the date and time, down to a thousandth of a second. The time part is factored into the range calculations, so you must take that part into account when you’re specifying your range. If you don’t specify a specific time, as is the case in our SELECT statement, SQL Server assumes the time is midnight-all zeroes.

One approach to dealing with this issue is to specify the entire DATETIME values:

SELECT * FROM OurStuff

WHERE StuffDate BETWEEN '2014-07-10 06:46:48.240'

AND '2014-07-13 22:46:48.240';

This way, our results will include all rows whose DATETIME values fall within this very specific range:

StuffID	StuffProdID	StuffDate
4	ab49	2014-07-10 06:46:48.240
5	ab58	2014-07-11 04:46:48.240
6	ab59	2014-07-12 02:46:48.240
7	ab398	2014-07-13 22:46:48.240

The trouble with this approach, however, is that it can get fairly cumbersome. Instead of using the BETWEEN operator, you might consider the greater than or equal to operator (>=) along with the lesser than operator (<):

SELECT * FROM OurStuff

WHERE StuffDate >= '20140710'

AND StuffDate < '20140714';

For the beginning of the range, we specify that the date be greater than or equal to the specified date, which puts us right at midnight. We then use the lesser than operator to specify one day past our intended date. Because it will also be set to midnight, it will pick up anything before midnight on the previous day. The following table shows the results now returned by the SELECT statement:

StuffID	StuffProdID	StuffDate
4	ab49	2014-07-10 06:46:48.240
5	ab58	2014-07-11 04:46:48.240
6	ab59	2014-07-12 02:46:48.240
7	ab398	2014-07-13 22:46:48.240

“I’ve seen some T-SQL code that contains ‘`national` `character` `varying(max)`‘ listed as that data type, rather than one of the normal types, yet everything seems to work fine. Is this something new?”

There’s nothing new about national character data types such as NATIONAL CHARACTER VARYING. SQL Server includes them as synonyms to appease the ISO gods in order to conform to the international standards set by that organization. In fact, many of the SQL Server data types include ISO synonyms. (The TechNet topic “Data Type Synonyms” lists all the ISO synonyms.)

You can use the synonyms in place of the names of the base data types in your data definition language (DDL) statements. However, once you create the object, the synonyms are no longer used. For example, the following T-SQL script creates a table that includes columns configured with the NVARCHAR and NATIONAL CHARACTER VARYING data types:

IF OBJECT_ID('OurStuff', 'U') IS NOT NULL

DROP TABLE OurStuff;

CREATE TABLE OurStuff

(

StuffID INT NOT NULL IDENTITY PRIMARY KEY,

StuffName1 NVARCHAR(MAX) NOT NULL,

Stuffname2 NATIONAL CHARACTER VARYING(MAX) NOT NULL

);

INSERT INTO OurStuff(StuffName1, Stuffname2)

VALUES('Really Big Thing', 'Amazingly Big Thing');

We can specify the national character type just like we do any other type. Plus, we can insert and manipulate data just like any other type. But now let’s look at the column metadata:

SELECT

c.name AS ColumnName,

t.name AS TypeName

FROM sys.columns c INNER JOIN sys.types t

ON c.user_type_id = t.user_type_id

WHERE object_id = OBJECT_ID('OurStuff')

ORDER BY c.column_id;

All we’re doing is pulling the column names and their data types, after we created our table. The following results tell all:

ColumnName	TypeName
StuffID	int
StuffName1	nvarchar
Stuffname2	nvarchar

As you can see, both character columns have been created with the NVARCHAR data type. The NATIONAL CHARACTER VARYING alias has disappeared altogether.

“When is it best to use the `CHAR` date type rather than `VARCHAR`?”

The key to knowing when to use one over the other is to first understand the differences between them. The CHAR and NCHAR data types store data at a fixed length. If an inserted value is less then that length, it is padded with trailing spaces to ensure that every value is of equal length. The VARCHAR and NVARCHAR data types are variable length, which means the size depends on the inserted value, plus an additional couple of bytes for overhead.

The following example helps illustrate the differences between the data types:

DECLARE

@stuff1 CHAR(7) = 'thing',

@stuff2 NCHAR(7) = 'thing',

@stuff3 VARCHAR(10) = 'thing',

@stuff4 NVARCHAR(10) = 'thing';

SELECT

DATALENGTH(@stuff1) AS char1,

DATALENGTH(@stuff2) AS nchar2,

DATALENGTH(@stuff3) AS varchar3,

DATALENGTH(@stuff4) AS nvarchar4;

All we’re doing here is using the DATALENGTH function to determine the size of each variable, as shown in the following results:

char1	nchar2	varchar3	nvarchar4
7	14	5	10

Although the value is only five characters long, the length of the CHAR value is seven bytes, because that was how the variable was defined. The NCHAR value works the same way, except that it’s doubled to account for the Unicode characters, so the DATALENGTH function returns 14. On the other hand, the VARCHAR value requires only the bytes needed for the value (not including the overhead bytes). The NVARCHAR value merely doubles that.

The choice, then, on whether to use CHAR or VARCHAR depends on how consistent the size of the values are and how small the values are. Microsoft recommends that you use CHAR only when value lengths are consistent, and use VARCHAR when the lengths vary a lot. Microsoft is noncommittal about the gray areas in between.

Small columns (fewer than five or six characters) often work well as CHAR columns when you take into account the overhead added to a VARCHAR column. Even so, some developers would suggest that you use CHAR only if all values are exactly the same length. If values are shorter than the designated length, they get padded with spaces, which can be annoying to deal with and could potentially waste a lot of space if your values are mostly one-to-two characters but you’ve created a CHAR(5) column to accommodate a relatively few five-digit values.

That said, if all your values are a comparable length, you don’t incur the storage penalty for overhead that comes with VARCHAR, although storage has grown so cheap, some of the concerns about the extra space have been mitigated. You might see some performance gains if you go with CHAR, but the significance of those gains can vary and are perhaps not worth the trade-off of having to deal with the trailing spaces. Even so, CHAR is a useful option when your data is consistent and small.

“I’m building a query that joins a `FLOAT` column to a `REAL` column, but when I run the statement, it returns no results, even though I can see the matching values in the two columns. Why is this happening?”

Welcome to the shifty world of approximate-number data types, in which floating-point data is merely an approximation and the numbers you see might not be the real numbers at all. Let me demonstrate. The following T-SQL declares a FLOAT variable, sets its value, and retrieves the value as its represented in SQL Server Management Studio (SSMS) as well as it actually exists in the database:

DECLARE @StuffAvg FLOAT = 33.33;

SELECT @StuffAvg AS StuffAvg,

STR(@StuffAvg, 30, 15) AS RealStuff;

The STR function converts the numerical data to character data and, in the process, gives us a glimpse of what’s really out there:

StuffAvg	RealStuff
33.33	33.329999999999998

So here’s what’s happened. We assigned the value 33.33 to the variable, but the database engine actually stored it as 33.329999999999998, thus the approximation component. When SSMS retrieves that value, however, it sees fit to once again make it look nice and neat (a fact that displeases more than a few in the SQL Server user community).

On the surface, all this might not seem too big a deal-or perhaps it does. In any case, let’s take a closer look. The following T-SQL creates a table that contains a FLOAT column and a REAL column and then populates those columns with a few choice values:

IF OBJECT_ID('OurStuff', 'U') IS NOT NULL

DROP TABLE OurStuff;

CREATE TABLE OurStuff

(

StuffID INT NOT NULL IDENTITY PRIMARY KEY,

StuffAvg1 FLOAT NOT NULL,

StuffAvg2 REAL NOT NULL

);

INSERT INTO OurStuff(StuffAvg1, StuffAvg2)

VALUES(33.33, 33.33), (66.66, 66.66), (99.99, 99.99);

Now let’s do what we did with our variable and retrieve the SSMS-enhanced values and the actual values:

SELECT

StuffID,

StuffAvg1,

STR(StuffAvg1, 30, 15) AS RealStuff1,

StuffAvg2,

STR(StuffAvg2, 30, 15) AS RealStuff2

FROM OurStuff;

Once again, our trusty STR function comes to the rescue and shows us what’s what:

StuffID	StuffAvg1	RealStuff1	StuffAvg2	RealStuff2
1	33.33	33.329999999999998	33.33	33.330001831054688
2	66.66	66.659999999999997	66.66	66.660003662109375
3	99.99	99.989999999999995	99.99	99.989997863769531

As you can see, not only do the FLOAT and REAL data types like to approximate their values, they also like to do it differently. Yet according to the SSMS perspective, each pair of values is exactly the same. What makes matters worse is that our references to the values also get somewhat fuzzy. Check it out:

1 2	SELECT * FROM OurStuff WHERE StuffAvg1 = 33.33;

When we specify 33.33 in our WHERE clause, the applicable row is returned:

StuffID	StuffAvg1	StuffAvg2
1	33.33	33.33

Now let’s specify 33.329999999999998 in our WHERE clause:

1 2	SELECT * FROM OurStuff WHERE StuffAvg1 = 33.329999999999998;

The statement returns the exact results as the previous one. And it works the same way for our REAL column. Either value will return the same row.

By all appearances, then, it would seem that we could specify either value or compare values and everything would work out fine. Guess again. This time around, we’ll compare the values as they appear in SSMS, without the benefit of STR:

DECLARE @StuffAvg1 FLOAT = 33.33;

DECLARE @StuffAvg2 REAL = 33.33;

CASE

WHEN @StuffAvg1 = @StuffAvg2

THEN 'equal'

ELSE 'not equal'

END;

It turns out that the statement returns a value of not equal, suggesting that, despite appearances, not all instances of FLOAT and REAL are anything close to equal. So what happens if we try to join a FLOAT column to a REAL column? To find out, let’s create a couple tables and add some data:

IF OBJECT_ID('OurStuff', 'U') IS NOT NULL

DROP TABLE OurStuff;

IF OBJECT_ID('OurOtherStuff', 'U') IS NOT NULL

DROP TABLE OurOtherStuff;

CREATE TABLE OurStuff

(

StuffID INT NOT NULL IDENTITY PRIMARY KEY,

StuffAvg1 FLOAT NOT NULL

);

CREATE TABLE OurOtherStuff

(

StuffID INT NOT NULL IDENTITY PRIMARY KEY,

StuffAvg2 REAL NOT NULL

);

INSERT INTO OurStuff(StuffAvg1)

VALUES(33.33), (66.66), (99.99);

INSERT INTO OurOtherStuff(StuffAvg2)

VALUES(33.33), (66.66), (99.99);

Now let’s retrieve the data in the OurStuff table:

1	SELECT * FROM OurStuff;

Not surprisingly, our results look like this:

StuffID	StuffAvg1
1	33.33
2	66.66
3	99.99

Next, let’s retrieve the data in the OurOtherStuff table:

1	SELECT * FROM OurOtherStuff;

Once again, no surprises:

StuffID	StuffAvg2
1	33.33
2	66.66
3	99.99

So with everything in place, let’s join these critters:

SELECT a.StuffID, a.StuffAvg1, b.StuffAvg2

FROM OurStuff a INNER JOIN OurOtherStuff b

ON a.StuffAvg1 = b.StuffAvg2;

As it turns out, the statement returns no rows, which is why Microsoft generally recommends that you avoid using FLOAT or REAL in equality constructions in places such as joins or WHERE conditions. To make this work, you would need to do some rounding or converting or creating calculated columns or something as equally clever. Just don’t rely on FLOAT and REAL for any sort of precision. That’s not what they were designed to do. They offer a great deal of flexibility, but at a price.

“I have a table that includes a `VARCHAR` column and a stored procedure that accesses data from the table, based on values in that column. The procedure includes an `NVARCHAR` parameter used to qualify the `WHERE` clause, which references the `VARCHAR` column. Does it matter that the data types are different between the parameter and the column?”

Yes, it matters, but let’s make sure we’re talking about the same thing. The following T-SQL script creates a table, inserts data into that table, creates a procedure that retrieves data from that table, and then executes the procedure, passing in the parameter value Really Big Thing:

IF OBJECT_ID('OurStuff', 'U') IS NOT NULL

DROP TABLE OurStuff;

CREATE TABLE OurStuff

(

StuffID INT NOT NULL IDENTITY PRIMARY KEY,

StuffName VARCHAR(25) NOT NULL,

StuffDate DATETIME NOT NULL DEFAULT GETDATE()

);

INSERT INTO OurStuff(StuffName)

VALUES('Thing'), ('Big Thing'), ('Really Big Thing'), ('Amazingly Big Thing');

IF OBJECT_ID('FindStuff', 'P') IS NOT NULL

DROP PROCEDURE FindStuff;

CREATE PROCEDURE FindStuff

@stuff NVARCHAR(25)

SELECT StuffID, StuffDate FROM OurStuff

WHERE StuffName = @stuff;

EXEC FindStuff 'Really Big Thing';

When you run the procedure, it returns the StuffID and StuffDate values, as the following results show:

StuffID	StuffDate
3	2014-07-06 17:16:29.457

ou might have noticed that the table’s StuffName column is configured as VARCHAR and the procedure’s @stuff parameter is configured as NVARCHAR. The database engine implicitly converts the value without no apparent problem. In the background, however, we might find a different story. Although the database engine can perform an implicit conversion without a hiccup, the operation can result in a performance penalty, in part because it is preventing index seeks. As a result, when you start adding millions-or billions-of rows to the equation, a minor data type mismatch can have a significant impact. For more information about all this, check out my Simple Talk article “Data Conversion in SQL Server.”

“My query needs to calculate past and future dates based on an existing date as well as calculate the time difference between two dates. I’ve tried using arithmetic operators to perform the calculations, but the results are never right. How do I go about working with dates?”

The best two tools for achieving what you’re trying to do are the DATEADD and DATEDIFF functions. DATEADD subtracts or adds a specified interval of time to a date value, which can be any date or time data type. DATEDIFF returns the interval of time between two specified date values. Again, the values can be any of the date or time date types.

Let’s first look at how the DATEADD function works. The following T-SQL declares a DATETIME variable, uses the GETDATE function to assign the current date and time values to the function, and then retrieves the dates 10 years forward and 10 years back:

DECLARE @StuffDate DATETIME = GETDATE();

SELECT

@StuffDate AS StuffDate,

DATEADD(yy, 10, @StuffDate) AS Plus10,

DATEADD(yy, -10, @StuffDate) AS Less10;

The DATEADD function takes three arguments. The first argument is the date part, such as day, week, month, or year. In this case, we’ve specified yy for year. The second argument is the number of years forward or backward we want to go. Finally, we specify our base date. The SELECT statement returns the following results:

StuffDate	Plus10	Less10
2014-07-07 19:03:50.183	2024-07-07 19:03:50.183	2004-07-07 19:03:50.183

If we want to extract a specific date or time element, we can use the DATEPART function, as in the following example:

DECLARE @StuffDate DATETIME = GETDATE();

SELECT

DATEPART(yy, @StuffDate) AS StuffDate,

DATEPART(yy, DATEADD(yy, 10, @StuffDate)) AS Plus10,

DATEPART(yy, DATEADD(yy, -10, @StuffDate)) AS Less10;

All we’ve done is specify that only the year should be extracted from the date, as shown in the following results:

StuffDate	Plus10	Less10
2014	2024	2004

Now let’s use the DATEDIFF function to calculate the number of months between two dates:

DECLARE @StuffDate1 DATETIME = GETDATE();

DECLARE @StuffDate2 DATETIME = DATEADD(yy, 10, @StuffDate1);

SELECT

DATEPART(yy, @StuffDate1) AS StuffDate1,

DATEPART(yy, @StuffDate2) AS StuffDate2,

DATEDIFF(mm, @StuffDate1, @StuffDate2) AS MonthDiff;

The DATEDIFF function also requires three arguments. As with DATEADD, the first argument is the date part. This time around we’re using mm for month. The next two arguments are the dates we want to compare, in this case, our two DATETIME variables. As the following results show, there are 120 months between the two dates:

StuffDate1	StuffDate2	MonthDiff
2014	2024	120

The functions available in SQL Server for working with date and time values are definitely ones worth getting to know. Date and time values are treated differently from other types of data and require special consideration. You can find more about these functions in the MSDN topic “Date and Time Data Types and Functions.”

“I’m told never to use `SQL_VARIANT` data type inappropriately. What is an appropriate use for this data type?”

SQL_VARIANT is included in SQL Server mainly for system use. Although it is used within system procedures and views, It has several important restrictions. It can only be used with great caution, and there is usually a better way. It cannot be used in a relational table. However, there do exist some appropriate specialised uses.

An appropriate use is one in which you have no other reasonable option or a reasonable option requires a workaround that causes more trouble than using SQL_VARIANT. In fact, under the right circumstances, the data type could prove a valuable tool. It supports most T-SQL base types all within a single structure, such as a column or parameter. For example, if you define a column with SQL_VARIANT, you can insert character data and numerical data and data/time data. Here’s proof. The following T-SQL script creates a table that includes a SQL_VARIANT column and then inserts into it an assortment of values:

IF OBJECT_ID('OurStuff', 'U') IS NOT NULL

DROP TABLE OurStuff;

CREATE TABLE OurStuff

(

StuffID INT NOT NULL IDENTITY PRIMARY KEY,

StuffName SQL_VARIANT NOT NULL

);

INSERT INTO OurStuff(StuffName) VALUES('Same Old Thing');

INSERT INTO OurStuff(StuffName) VALUES(479);

INSERT INTO OurStuff(StuffName) VALUES(GETDATE());

INSERT INTO OurStuff(StuffName) VALUES(1e9);

INSERT INTO OurStuff(StuffName) VALUES(453.66903438);

Sure, we can insert the same data into a VARCHAR column, but the values would all come out VARCHAR data. The SQL_VARIANT data type, on the other hand, preserves the original type, which we can verify by using the SQL_VARIANT_PROPERTY function:

SELECT

StuffName,

SQL_VARIANT_PROPERTY(StuffName, 'basetype') AS BaseType,

SQL_VARIANT_PROPERTY(StuffName, 'maxlength') AS BaseLength,

SQL_VARIANT_PROPERTY(StuffName, 'precision') AS BasePrecision

FROM OurStuff;

And here are the results:

StuffName	BaseType	BaseLength	BasePrecision
Same Old Thing	varchar	8000	0
479	int	4	10
2014-07-08 09:06:34.440	datetime	8	23
1000000000	float	8	53
453.66903438	numeric	9	11

Notice that each type has been preserved. SQL_VARIANT was even able to pick up on the scientific notation. So you can see why, in certain circumstances, such a data type could be useful. For example, you might consider SQL_VARIANT when defining a parameter in a stored procedure or function. This approach helps to maintain some of the precision you might otherwise lose when passing in numeric or date/time values through a VARCHAR parameter. Another example of how SQL_VARIANT might be useful is if you’re supporting an application that allows users to define their own tables and fields. SQL_VARIANT can help you avoid having to set up extra structures to support each data type.

You’ll have to decide on a case-by-case basis whether to use SQL_VARIANT, but know that the data type comes with a number of limitations. For instance, SQL Server converts SQL_VARIANT values to NVARCHAR(4000) to support older versions of the OLE DB and ODBC providers. Same thing with many other external tools that don’t recognize SQL_VARIANT. In addition, using linked servers in conjunction with SQL_VARIANT can lead to potential memory leaks. Plus, you can’t use the data type in statement elements such as LIKE expressions or computed columns. So before you implement the SQL_VARIANT data type, do your homework. A good place to start with the TechNet topic “Using sql_variant Data.

Questions about SQL Server Data Types You were Too Shy to Ask

The Questions

“Under what circumstances is character data truncated without triggering an error?”

“SQL Server’s implicit conversions sometimes lead to unexpected query results, such as a value being returned as a data type different from what I expect. How do implicit conversions work?”

“I’ve created a query that uses the `ISNUMERIC` function to verify whether a value is numeric and, if so, convert it to an integer. In some cases, however, the database engine generates an error because it’s trying to convert a value that’s not really a numeric. What’s going on?”

“I’m told I should use the `BIT` data type where possible to save space. Is it true that it saves space or is this just syntactic sugar?”

“When I use the `DATALENGTH` and `LEN` functions to calculate the size of a string, I often receive different results, even though I’d expect them to be the same. What’s the difference between the two functions?”

“I’m troubleshooting a stored procedure that uses a `COALESCE` expression to return the first non-NULL value from a set of columns. However the procedure often generates a conversion error, even if the expression returns is the first column listed. Any idea why this might be occurring?”

“I’ve seen some T-SQL code that contains ‘`national` `character` `varying(max)`‘ listed as that data type, rather than one of the normal types, yet everything seems to work fine. Is this something new?”

“When is it best to use the `CHAR` date type rather than `VARCHAR`?”

“I’m building a query that joins a `FLOAT` column to a `REAL` column, but when I run the statement, it returns no results, even though I can see the matching values in the two columns. Why is this happening?”

“My query needs to calculate past and future dates based on an existing date as well as calculate the time difference between two dates. I’ve tried using arithmetic operators to perform the calculations, but the results are never right. How do I go about working with dates?”

“I’m told never to use `SQL_VARIANT` data type inappropriately. What is an appropriate use for this data type?”

Subscribe for more articles

Rate this article

Robert Sheldon

Related articles

Backing Up MySQL Part 7: MySQL Enterprise Backup

What’s new in T-SQL in SQL Server 2022

Snake draft sorting in SQL Server, part 1

Tags

Questions about SQL Server Data Types You were Too Shy to Ask

The Questions

“Under what circumstances is character data truncated without triggering an error?”

“SQL Server’s implicit conversions sometimes lead to unexpected query results, such as a value being returned as a data type different from what I expect. How do implicit conversions work?”

“I’ve created a query that uses the ISNUMERIC function to verify whether a value is numeric and, if so, convert it to an integer. In some cases, however, the database engine generates an error because it’s trying to convert a value that’s not really a numeric. What’s going on?”

“I’m told I should use the BIT data type where possible to save space. Is it true that it saves space or is this just syntactic sugar?”

“When I use the DATALENGTH and LEN functions to calculate the size of a string, I often receive different results, even though I’d expect them to be the same. What’s the difference between the two functions?”

“I’m troubleshooting a stored procedure that uses a COALESCE expression to return the first non-NULL value from a set of columns. However the procedure often generates a conversion error, even if the expression returns is the first column listed. Any idea why this might be occurring?”

“I’ve seen some T-SQL code that contains ‘national character varying(max)‘ listed as that data type, rather than one of the normal types, yet everything seems to work fine. Is this something new?”

“When is it best to use the CHAR date type rather than VARCHAR?”

“I’m building a query that joins a FLOAT column to a REAL column, but when I run the statement, it returns no results, even though I can see the matching values in the two columns. Why is this happening?”

“My query needs to calculate past and future dates based on an existing date as well as calculate the time difference between two dates. I’ve tried using arithmetic operators to perform the calculations, but the results are never right. How do I go about working with dates?”

“I’m told never to use SQL_VARIANT data type inappropriately. What is an appropriate use for this data type?”

Subscribe for more articles

Rate this article

Robert Sheldon

Related articles

Backing Up MySQL Part 7: MySQL Enterprise Backup

What’s new in T-SQL in SQL Server 2022

Snake draft sorting in SQL Server, part 1

Tags

“I’ve created a query that uses the `ISNUMERIC` function to verify whether a value is numeric and, if so, convert it to an integer. In some cases, however, the database engine generates an error because it’s trying to convert a value that’s not really a numeric. What’s going on?”

“I’m told I should use the `BIT` data type where possible to save space. Is it true that it saves space or is this just syntactic sugar?”

“When I use the `DATALENGTH` and `LEN` functions to calculate the size of a string, I often receive different results, even though I’d expect them to be the same. What’s the difference between the two functions?”

“I’m troubleshooting a stored procedure that uses a `COALESCE` expression to return the first non-NULL value from a set of columns. However the procedure often generates a conversion error, even if the expression returns is the first column listed. Any idea why this might be occurring?”

“I’ve seen some T-SQL code that contains ‘`national` `character` `varying(max)`‘ listed as that data type, rather than one of the normal types, yet everything seems to work fine. Is this something new?”

“When is it best to use the `CHAR` date type rather than `VARCHAR`?”

“I’m building a query that joins a `FLOAT` column to a `REAL` column, but when I run the statement, it returns no results, even though I can see the matching values in the two columns. Why is this happening?”

“I’m told never to use `SQL_VARIANT` data type inappropriately. What is an appropriate use for this data type?”