I like this article a lot. In fact, it is one of the very best I have come across on SSC. Congratulations are definitely in order, both for style and content.
It is slightly sad that only a small fraction of the potential audience will appreciate exactly what you have done here. I think that's probably a reflection of how few SQL people have a good grasp of .NET code and a willingness to extend the capabilities of T-SQL in sensible, appropriate, and efficient ways. It's so easy to misuse the CLR integration and I guess that makes it all the more refreshing to see a well thought out article on a key subject.
I certainly do like the concise code made possible by the use of yield, and it certainly helps people to avoid the all-too-common mistake of generating a huge memory object in the Init method and then iterating over that in the FillRow calls.
That said, I don't tend to use yield in practice, for the following reasons:
1. The bug you refer to in the article with SqlChars and SqlBytes is a worry. I haven't come across that myself; however that may just be dumb luck. To be fair, I don't know whether that bug only manifests itself with yield. It is a consideration though - especially since SqlChars and SqlBytes are the only streaming input types.
2. The compiler emits a fair chunk of MSIL when it encounters yield - in fact it generates an entire class. More on that in a bit. I have a natural (and, I hope, healthy) skepticism with this sort of thing: the compiler must generate code which is general in nature and provably correct in all situations. This necessarily means that yield will be somewhat slower than a hand-written equivalent.
3. The extra code emitted by the compiler cannot be debugged in the usual way - you can't 'step into' it in a debugging session.
4. Yield cannot be used in a try...catch block. This is an annoying restriction and complicates error handling somewhat.
Of course you can't argue with the code economy - it can produce extremely neat code - and I do find that appealing.
The alternative is to hand-code a helper class. This involves more typing for sure, but in the 20 million row test featured at the end of your article the hand-rolled version used 18.7 seconds of CPU on my home machine, compared to 19.8 seconds for the yield version - a 6% difference.
Here's a rig based on the article's example code showing both a yield and helper-class implementation, just in case any is interested:
(The helper class code is pretty close to the code emitted by the compiler for yield)
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections;
public partial class TestFunctions
{
[Microsoft.SqlServer.Server.SqlFunction
(
DataAccess = DataAccessKind.None,
IsDeterministic = true,
IsPrecise = true,
SystemDataAccess = SystemDataAccessKind.None,
FillRowMethodName = "FillRow",
TableDefinition = "Value INT"
)
]
public static IEnumerable StreamingHandCoded(SqlInt32 MaxValue)
{
return new Helper(MaxValue.Value);
}
[Microsoft.SqlServer.Server.SqlFunction
(
DataAccess = DataAccessKind.None,
IsDeterministic = true,
IsPrecise = true,
SystemDataAccess = SystemDataAccessKind.None,
FillRowMethodName = "FillRow",
TableDefinition = "Value INT"
)
]
public static IEnumerable StreamingYield(SqlInt32 MaxValue)
{
for (int i = 0; i < MaxValue; i++)
{
yield return i;
};
}
public static void FillRow(object i, out SqlInt32 Value)
{
Value = new SqlInt32((int)i);
}
#region HelperClass
public sealed class Helper : IEnumerable, IEnumerator
{
readonly int maxValue;
int index = -1;
public Helper(int maxValue)
{
this.maxValue = maxValue;
}
#region IEnumerable Members
IEnumerator IEnumerable.GetEnumerator()
{
return this;
}
#endregion
#region IEnumerator Members
object IEnumerator.Current
{
get { return index; }
}
bool IEnumerator.MoveNext()
{
return ++index < maxValue;
}
void IEnumerator.Reset()
{
index = -1;
}
#endregion
}
#endregion
};
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi