RE: CLR Table-Valued Function Example with Full Streaming (STVF)

SSC Guru

Points: 150467

January 12, 2010 at 7:17 am

I like this article a lot. In fact, it is one of the very best I have come across on SSC. Congratulations are definitely in order, both for style and content.

It is slightly sad that only a small fraction of the potential audience will appreciate exactly what you have done here. I think that's probably a reflection of how few SQL people have a good grasp of .NET code and a willingness to extend the capabilities of T-SQL in sensible, appropriate, and efficient ways. It's so easy to misuse the CLR integration and I guess that makes it all the more refreshing to see a well thought out article on a key subject.

I certainly do like the concise code made possible by the use of yield, and it certainly helps people to avoid the all-too-common mistake of generating a huge memory object in the Init method and then iterating over that in the FillRow calls.

That said, I don't tend to use yield in practice, for the following reasons:

1. The bug you refer to in the article with SqlChars and SqlBytes is a worry. I haven't come across that myself; however that may just be dumb luck. To be fair, I don't know whether that bug only manifests itself with yield. It is a consideration though - especially since SqlChars and SqlBytes are the only streaming input types.

2. The compiler emits a fair chunk of MSIL when it encounters yield - in fact it generates an entire class. More on that in a bit. I have a natural (and, I hope, healthy) skepticism with this sort of thing: the compiler must generate code which is general in nature and provably correct in all situations. This necessarily means that yield will be somewhat slower than a hand-written equivalent.

3. The extra code emitted by the compiler cannot be debugged in the usual way - you can't 'step into' it in a debugging session.

4. Yield cannot be used in a try...catch block. This is an annoying restriction and complicates error handling somewhat.

Of course you can't argue with the code economy - it can produce extremely neat code - and I do find that appealing.

The alternative is to hand-code a helper class. This involves more typing for sure, but in the 20 million row test featured at the end of your article the hand-rolled version used 18.7 seconds of CPU on my home machine, compared to 19.8 seconds for the yield version - a 6% difference.

Here's a rig based on the article's example code showing both a yield and helper-class implementation, just in case any is interested:

(The helper class code is pretty close to the code emitted by the compiler for yield)

using System.Data.SqlTypes;

using Microsoft.SqlServer.Server;

using System.Collections;

public partial class TestFunctions

{

[Microsoft.SqlServer.Server.SqlFunction

(

DataAccess = DataAccessKind.None,

IsDeterministic = true,

IsPrecise = true,

SystemDataAccess = SystemDataAccessKind.None,

FillRowMethodName = "FillRow",

TableDefinition = "Value INT"

)

]

public static IEnumerable StreamingHandCoded(SqlInt32 MaxValue)

{

return new Helper(MaxValue.Value);

}

[Microsoft.SqlServer.Server.SqlFunction

(

DataAccess = DataAccessKind.None,

IsDeterministic = true,

IsPrecise = true,

SystemDataAccess = SystemDataAccessKind.None,

FillRowMethodName = "FillRow",

TableDefinition = "Value INT"

)

]

public static IEnumerable StreamingYield(SqlInt32 MaxValue)

{

for (int i = 0; i < MaxValue; i++)

{

yield return i;

};

}

public static void FillRow(object i, out SqlInt32 Value)

{

Value = new SqlInt32((int)i);

}

#region HelperClass

public sealed class Helper : IEnumerable, IEnumerator

{

readonly int maxValue;

int index = -1;

public Helper(int maxValue)

{

this.maxValue = maxValue;

}

#region IEnumerable Members

IEnumerator IEnumerable.GetEnumerator()

{

return this;

}

#endregion

#region IEnumerator Members

object IEnumerator.Current

{

get { return index; }

}

bool IEnumerator.MoveNext()

{

return ++index < maxValue;

}

void IEnumerator.Reset()

{

index = -1;

}

#endregion

}

#endregion

};

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi