Performance issue with tally solution

  • By the way... thanks, Phil. I didn't know decompilers had come some far. I thought I was going to be looking at raw machine language and I haven't done THAT in almost 30 years.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If you get the chance Phil, you might want to suggest to someone that they could include some kind of installation instructions somewhere...? The ReadMe in the ZIP file would be a good place, it's all just marketing blurbs right now.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • The URL I gave you [p]Net Reflector: Soup to Nuts[/url] has a section that tells you how to install it. [/p]

    [p]I'll pass on, to the developers, your request to have an installation instructions in the zip file. Basically, it doesn't need to be installed, though it is a good idea to register it in your system.[/p]

    Best wishes,
    Phil Factor

  • A thought here for some of you who might have the ear of the powers that be at Microsoft. Given that CLR is demonstrably faster at some things than classic SQL set-based approaches, why not develop and distribute a set of common CLR functions like the parsers that have just been demonstrated in this thread? Microsoft is already following this approach for Spatial functions, it only seems logical to apply it to more prosaic functions.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • [font="Verdana"]I'd like to see Microsoft ship SQL Server with the RegEx stuff built in. I guess the whole "server crashing capability" is something they would need to resolve first.

    Architecturally though, it's kind of strange to fill in the issues with one language by drawing on another. And yet, in many ways, that's exactly what .Net is all about.

    T-SQL compiled to .Net? Hmmm...

    [/font]

  • I have no doubt they could create additional functions within SQL. Look at what can be done with FOR XML concatenation now. If they chose, they could include a parse function that returns a table variable with the parsed strings. They just seem reluctant to do that.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob Hovious (4/19/2009)


    A thought here for some of you who might have the ear of the powers that be at Microsoft. Given that CLR is demonstrably faster at some things than classic SQL set-based approaches, why not develop and distribute a set of common CLR functions like the parsers that have just been demonstrated in this thread? Microsoft is already following this approach for Spatial functions, it only seems logical to apply it to more prosaic functions.

    Hi Bob

    Would be a pleasure!

    A simple example how to convert huge binary data to hexadecimal strings (quiet frequently asked question here).

    Test data are 100 rows with 1mb binary data in each row.

    Just as simple scalar function:

    UPDATE #BinData SET HexString = dbo.ufn_clr_varbintohexstr(BinData)

    Performance Results:

    Description Duration CpuTime PhysicalReads PhysicalWrites LogicalReads

    ------------------------------ ----------- ----------- ------------- -------------- ------------

    100mb binary to hex string CLR 9173 8830 0 78602 4880501

    The C# source code:

    [Microsoft.SqlServer.Server.SqlFunction]

    public static SqlChars ufn_clr_varbintohexstr(

    [SqlFacet(IsFixedLength=false,

    IsNullable=false,

    MaxSize=-1

    )

    ]

    SqlBinary sqlByte

    )

    {

    // The binary data to be converted

    byte[] aBin = sqlByte.Value;

    // Lookup array for the hex values

    char[] hexLookup = new char[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };

    // Data to be returned

    char[] retChars = new char[(aBin.Length * 2) + 2];

    // The 0x prefix

    retChars[0] = '0';

    retChars[1] = 'x';

    int first;

    int second;

    for (int i = 0; i < aBin.Length; i++)

    {

    first = aBin / 16;

    second = aBin % 16;

    retChars = hexLookup[first];

    retChars = hexLookup[second];

    }

    return new SqlChars(retChars);

    }

    Barry also showed a parser which performed much better than the TSQL solution in another thread some weeks ago. Sorry, I don't remember the topic.

    Greets

    Flo

  • [font="Verdana"]If you are building that translation for performance, I'd make one change:

    Use an array of 256 2-character strings with the hexadecimal encoding for each byte. So you will have one lookup per byte rather than two, and also not need the calculations for the lookups.

    Now that is one function I would like to see in SQL Server itself!

    [/font]

  • Bruce W Cassidy (4/19/2009)


    If you are building that translation for performance, I'd make one change:

    Use an array of 256 2-character strings with the hexadecimal encoding for each byte. So you will have one lookup per byte rather than two, and also not need the calculations for the lookups.

    I just tried for fun 🙂 The new function:

    private static readonly string[] _hexLookup256;

    static UserDefinedFunctions()

    {

    // Initialize a static hexLookup for all 256 possible hex values

    List hexLookup = new List(256);

    for (int i = 0; i < 256; i++)

    {

    hexLookup.Add(i.ToString("x2"));

    }

    _hexLookup256 = hexLookup.ToArray();

    }

    [Microsoft.SqlServer.Server.SqlFunction]

    public static SqlChars ufn_clr_varbintohexstr(

    [SqlFacet(IsFixedLength=false,

    IsNullable=false,

    MaxSize=-1

    )

    ]

    SqlBinary sqlByte

    )

    {

    // The binary data to be converted

    byte[] aBin = sqlByte.Value;

    // Data to be returned

    StringBuilder sbRet = new StringBuilder(aBin.Length * 2 + 2);

    sbRet.Append("0x");

    byte currentByte;

    for (int i = 0; i < aBin.Length; i++)

    {

    currentByte = aBin;

    sbRet.Append(_hexLookup256[currentByte]);

    }

    return new SqlChars(sbRet.ToString());

    }

    Test results:

    Description Duration CpuTime PhysicalReads PhysicalWrites LogicalReads

    ------------------------------- ----------- ----------- ------------- -------------- ------------

    100mb binary to hex string CLR 9986 9563 0 78594 4880502

    The problem is the string builder. It takes to much overhead inside.

    Now that is one function I would like to see in SQL Server itself!

    Feel free to share when you are ready :hehe:. Not really needed. I think this performance is just not reachable with TSQL.

    Greets

    Flo

  • Florian Reischl (4/19/2009)


    The problem is the string builder. It takes to much overhead inside.

    [font="Verdana"]Only if you use code to generate the lookup table. 😀 Go on, list all 256 values![/font]

  • Bruce W Cassidy (4/19/2009)


    [font="Verdana"]If you are building that translation for performance, I'd make one change:

    Use an array of 256 2-character strings with the hexadecimal encoding for each byte. So you will have one lookup per byte rather than two, and also not need the calculations for the lookups.

    Now that is one function I would like to see in SQL Server itself!

    [/font]

    That's basically what an XLATE function does, except that the codes are passed in one long string instead of a table.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Just curious.... lot's of folks call what's stored in VarBinary, well... Binary and it's actually displayed as hex. So, pardon my ignorance of which you speak, but could you display an example input and output of what you guys are talking about for the binary-to-hex conversion function? Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Florian Reischl (4/19/2009)Barry also showed a parser which performed much better than the TSQL solution in another thread some weeks ago. Sorry, I don't remember the topic.

    It's a transducer actually (and it's here), which is also the same kind of function that would work very well for this.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Florian Reischl (4/19/2009)


    The problem is the string builder. It takes to much overhead inside.

    I wouldn't use StringBuilder for this, you want to fill an array of characters or int16's and then convert them directly to a string (this used to be a lot easier back when we had mutable strings).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I'm sorry, folks, but these CLR string-splitting techniques don't scale up. It would be very dangerous to recommend CLR routines based on these tests because the test data was very small. I've just tested Paul's routine, DanielJ and the two Regex split routines on a string of a million characters (the text of Moby Dick). I've had to import text files larger than this for data feeds, but I thought that it would do.

    /*

    Using a WHILE Loop 2 secs 126 ms

    the CLR RegEx library on S-T 79 secs 703 ms

    Pauls CLR way 47 secs 186 ms (disqualified as it got the no. lines well-wrong)

    My Quirky Update way 2 secs 953 ms

    The DanielJ CLR way 16 secs 030 ms

    Florians Regex split 79 secs 093 ms*/

    The set-based and WHILE Loop are the most consistent methods, increasing linearly. The CLR techniques perform best with lightweight test data but soon get indigestion under load.

    Best wishes,
    Phil Factor

Viewing 15 posts - 121 through 135 (of 522 total)

You must be logged in to reply to this topic. Login to reply