The T-SQL language is quite extensive and includes many functions that are useful for various purposes. One of the lesser used ones I encounter is COALESCE(). Recently I found myself using this to help with a report recently. I decided to include a few other cases I found when researching how this function works as a way of ensuring your reports are easy to read for clients.
This article covers some basic description of the function and then a few examples of where you can use this function.
What is COALESCE()?
This function is designed to work with NULL values and return alternatives if you do not want clients calling you with complaints about the "NULL" salesperson. Basically, the COALESCE function takes a number of parameters and returns the first non-NULL one to the caller. An example of this is shown in this image, where I've passed two parameters.
In this case, I've passed in a NULL, which is skipped, and then the first non-null parameter is returned. The actual syntax is shown here. As you can see, you include however many parameters you need and the first non-null one is returned:
COALESCE ( expression [ ,...n ] )
I can show a few more examples in this code. As you can see below, I use varying numbers of parameters and where the NULL is located. For the second article, we return 2 and never get to the NULL. In the third and fourth examples, we skip all the NULL values and return the final parameter.
Interesting, but where is this useful? After all, this seems like a ISNULL() type of function, and we already have that function.
There are a few places this is handy, and I'll show these now in the next few sections.
Reporting Update Dates
One of the things I've seen in code in my career is the need to report an update date for something like an order. Imagine that I have a table that contains these dates for an order:
CREATE TABLE OrderHeader ( OrderID INT NOT NULL, OrderDate DATETIME, CustomerID INT, OrderReceivedDate DATETIME, OrderUpdatedDate DATETIME, OrderProcessDate DATETIME, OrderCompleteDate DATETIME ); GO
This is a partial table, but as you can see, we have a number of dates related to the order process. In general, these are filled in as something happens, in the order they are listed in the table. As with many applications, developers allow these dates to be NULL before they are written. Why? Well, there is no good value to put in the OrderCompleteDate until the order is actually marked as complete. The same logic applies for the other dates.
In order to provide the application with a date to use for the last activity, I have written code like this, which determines which value to show based on the ordering in the CASE statement.
SELECT CustomerID, CASE WHEN oh.OrderCompleteDate IS NOT NULL THEN oh.OrderCompleteDate WHEN oh.OrderProcessDate IS NOT NULL THEN oh.OrderProcessDate WHEN oh.OrderUpdatedDate IS NOT NULL THEN oh.OrderUpdatedDate WHEN oh.OrderReceivedDate IS NOT NULL THEN oh.OrderReceivedDate ELSE oh.OrderDate END AS 'OrderDate' FROM dbo.OrderHeader AS oh;
If I use COALESCE(), I can simplify this, with an ordering for the priority of dates. The priority is the ordering. OrderCompleteDate is the highest priority. If this is NULL, then the OrderProcessDate has the second priority, and so forth.
SELECT CustomerID, COALESCE(oh.OrderCompleteDate, oh.OrderProcessDate, oh.OrderUpdatedDate, oh.OrderReceivedDate,oh.OrderDate) AS 'OrderDate' FROM dbo.OrderHeader AS oh;
The code with COALESCE is shorter and cleaner, no repeating of the column names, and less chance for mistakes. I've seen others, and I've also seen myself, copy and paste the WHEN clause to add another and forget to change with the column name name in the WHEN clause, as well as in the THEN clause. More code is often more bugs, so learn to write shorter code.
Combining Rows Together
One trick that I have used before in assigning data from multiple rows into one variable is the concatenation from a SELECT. This is similar to a PIVOT of the data. For example, if I have some data on products like this:
ProductID ProductName ProductCategory ProductSubCategory UnitPrice 1 Savadeck DECK6.1 Bike Mountain 1489.00 2 Zija Bike NULL 1753.47 3 eAhora AM200 Bike NULL 1899.99 4 Danycu 26 Bike Mountain 1328.80 5 Diamondback Sti8 Bike Hybrid 580.40
I can put the categories and subcategories together like this:
DECLARE @catlist VARCHAR(4000) = ''; SELECT @catlist = @catlist+ p.ProductCategory + COALESCE(':' + p.ProductSubCategory, '') + ', ' FROM dbo.Product AS p SELECT @catlist
This returns a result like this:
Bike:Mountain, Bike, Bike, Bike:Mountain, Bike:Hybrid,
Not a great example, and if I needed this type of result, I would want to add other code to get rid of the trailing comma, and perhaps use a DISTINCT in a CTE to limit the repeating data. However, in this case, you can see above that I use COALESCE() to ensure no NULLs get in there. I could use ISNULL, but I have had table designs that had cat1 and cat2, with a report only wanting the first category to be displayed. In those cases, with random NULLs put into the various options, I could do something more like this code, where I decide to show ProductCategory1 if it is not null. If it is, then show ProductCategory2, and if both are, show NA.
SELECT @catlist = @catlist + COALESCE(p.ProductCategory1, p.ProductCategory2, 'NA') + ', ' FROM dbo.Product AS p SELECT @catlist
In this case, we could use ISNULL, but I like to be consistent across code when it makes sense. If I have more than 2 choices, I'm nesting ISNULL()s, which is't ideal. COALESCE makes this simpler, and more consistent.
There are lots of examples where you might want to skip rows that contain NULLS, while grabbing other data and COALESCE helps, especially when there is a choice of data to show. While I would prefer well designed databases that ensure we have the proper data in all columns, that is never the case. Way too many NULLs in systems.
Data Type Issues
One of the reasons why COALESCE() is better than ISNULL() is when you have disparate data types. ISNULL() uses the data type of the first expression, which is easy to see in this code:
DECLARE @date int SELECT ISNULL(@date, 0), SQL_VARIANT_PROPERTY(ISNULL(@date, 0), 'BaseType')
The results are:
What if I had a different value in there, such as the CURRENT_TIMESTAMP variable. Perhaps not a great way of coding, but I see people try this:
DECLARE @date int SELECT ISNULL(@date, CURRENT_TIMESTAMP), SQL_VARIANT_PROPERTY(ISNULL(@date, 0), 'BaseType')
This results in an error:
Msg 257, Level 16, State 3, Line 3 Implicit conversion from data type datetime to int is not allowed. Use the CONVERT function to run this query.
Usually this results in someone writing a bunch of other code to check if the variable in NULL and then assigning a value. More code, more bugs, so COALESCE makes more sense. Here I can do this:
I haven't seen this a lot, but there are places where someone is triggering action on NULL and might want to assign a different data type. Of course, if @int is a number, then you are returning an INT, which causes other breakage. In general, I think you don't mix data types when you don't need to, and this isn't a great example, but it does show that COALESCE isn't assigning a datatype directly, and if the types convert, it converts them. Maybe the best example is this code. The first example works, the second doesn't.
DECLARE @date INT = 3, @n VARCHAR(20) SELECT @n = COALESCE(@date, CURRENT_TIMESTAMP) SELECT @n = ISNULL(@date, CURRENT_TIMESTAMP)
How Many Parameters Can You Use in COALESCE()?
This was a question I had, because the "n" in the syntax above, isn't documented. I can't find anyone that knows, though I did see a StackOverflow answer say 1000. I don't think so.
I ran this code. It's a loop to keep building a larger and larger COALESCE() command.
DECLARE @c VARCHAR(MAX) = '' , @p varchar(MAX) = 'null, 1' , @i INT = 2 , @l varCHAR(20) WHILE @i < 10000 BEGIN SELECT @l = CAST( LEN(@p) AS CHAR(10)) SELECT @c = 'SELECT COALESCE(' + @p + '), ' + CAST(@i AS CHAR(4)) + ' as i, ' + @l + ' as paramlen' BEGIN TRY EXEC(@c) END TRY BEGIN CATCH SELECT @i END CATCH SELECT @i += 1; SELECT @p = @p + ',1' END
Don't run this from SSMS, but from a SQLCMD window, I got this at the end:
i paramlen ----------- ----------- ----------- 1 9998 19999 (1 rows affected) i paramlen ----------- ----------- ----------- 1 9999 20001 (1 rows affected)
This seems to indicate that I can use 9999 parameters before and error is thrown.
I don't know how accurate this is, but my quick test shows that there are likely more parameters allowed than you will ever need.
The COALESCE() function is handy and a good way to handle NULLS in code. It takes multiple parameters and then returns the first not-NULL one. I've shown a few cases where this is useful in data displays for reports.
I used to use ISNULL more, but I ran across this article from Aaron Bertrand that got me to think more about COALESCE(). I also like that this is an ANSI standard function. Not that my code will run on Oracle or another platform, but I run into other platforms, and knowing how to write queries on them is useful. Plus, as the article linked above shows, even across products in Microsoft, ISNULL responds differently.
Ultimately, the hardest thing about using COALESCE was learning to spell it correctly, but after typing COALSECE a few times, I got better at typing. I'm sure you will find it easy to use as well.