Jeff Moden (3/12/2016)
When treating NULL as an "empty string" instead of "no string" and thus return rows for it, it is going to complicate and limits its use for serialization/de-serialization tasks (which is like its primary use).
I think that's the whole problem with the way a whole lot of people think. NULL should ever be treated as an "empty string" or "no string". Certainly, it should never be treated as Nothing. It should be treated only as "unknown" and, IMHO, if you give the function an unknown, it should return a confirmation that unknown was passed to it. This doesn't even violate the COUNT(*) problem that someone posted on the CONNECT because the NULL will be disregarded by COUNT(*).
It comes down what states of input there are and how can they be unique mapped to process-able output.
With scalar functions it is easy: NULL in is NULL out.
With the transformation between scalar and complex types things are well, more complex.
De-serialization tasks need to work exactly in reverse to the serialization, which is:
Empty array/set/list --> NULL
One empty string or ID --> ""
Two empty strings or ID ---> ","
Array with one ID 10 ---> "10"
Array with two ID 10 and 20 ---> "10,20"
Array with unknown ID and ID 20 ---> ",20"
Try to reverse this serialization and there is no way around it that NULL means empty set (else it would be indistinguishable against a string of length 0 or a nullable integer represented as such).
This mapping works well and supports every behavior one can desire.
Want to handle an empty set as a single row containing NULL?
We can do that at the price of loosing support for NULL values in the serialized array/set/list!
Use a "left join" or "outer apply" and that will result in a single row with NULL values for each column in the result.
Code something like this:
TableA as A
outer apply splitter( NULL, ',' ) as S
Ultimately it comes down to how broad the use of the function needs to be.
Personally I use splitters almost exclusively for de-serialization of serialized integer ID and other value types that come from an application layer.
Ordinal positions and support for nullable values in the serialization are highly desirable in this case.
Having to filter out records with ordinal position NULL would work too I think, but would more often be in the way then help.
Such a ** conditionally existing ** record would just be artificial and represent a state that is already supported naturally as an empty set.
Where we seem to disagree is how we view the result of our function and what it represents.
I view it as a set representation of the original array/set/list, where NULL means no values were serialized.
Scalar function behavior does not come into my reasoning, the output is a set.
And I view serializing nullable values as a perfectly valid use case.