Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

  • Wow... Outstanding job, Matt! Thanks for that!

    I'm pretty tickled that one or the other of us can beat the CLR methods pretty much hands down on eveything except RegEx, so far.

    Just so the rest of you folks get the "inside" story on this... Matt, myself, and a couple of other good folks (mostly Matt on the testing side 'cause I didn't have 2k5 back then and I don't know how to write CLR's) did a wad of testing on various challenges comparing throughput and performance of CLR's against some good ol' T-SQL methods. Since that time, we've had several other opportunities to do a comparison, like this one.

    Matt, correct me if I'm wrong, but it seems to me that only thing we haven't been able to beat CLR's with T-SQL, is the ReGex thing... am I remembering that correctly?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sounds about right... The Regex "single and multiple" match functions more or less matched T-SQL (with a slight edge Regex),

    but the pattern replacement test beat the T-SQL methods hands down.

    What's really interesting here is that it does seem to rather clearly show that if you are in fact "stuck" with a cursor method,

    you might still stand to make a lot of performance gains if you can use a "real" firehose cursor in CLR (one that doesn't require

    writing the results to a table one row at a time). The three method are actually very close algorithm-wise, and in their own way,

    both the CLR and the update method find ways to capitalize on NOT writing the results one row at a time. The Cursor method

    doesn't get to leverage either, so its performance suffers.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Don't mean to co-opt the thread but can you please email me a link to the regex code you found optimal for CLR? I have a client in DIRE need of such - they are using sp_OA... to do RegEx processing at the moment in a high-volume process and it is a total dog. 1/4 of the (fairly significant) server waits were due to xp delays.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (2/18/2008)


    Don't mean to co-opt the thread but can you please email me a link to the regex code you found optimal for CLR? I have a client in DIRE need of such - they are using sp_OA... to do RegEx processing at the moment in a high-volume process and it is a total dog. 1/4 of the (fairly significant) server waits were due to xp delays.

    Check your inbox....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • For totals that involve groups, I wonder if you can split the work in half.

    A multi-thread CLR?

    Where the first query get the first half while the second query gets the rest.

    Then join the threads.

    Obviously, for solutions requiring a running total from start to finish this wouldn't work. But if should work stellar for grouped running totals.

    Christopher Ford

  • Christopher Ford (2/18/2008)


    For totals that involve groups, I wonder if you can split the work in half.

    A multi-thread CLR?

    Where the first query get the first half while the second query gets the rest.

    Then join the threads.

    Obviously, for solutions requiring a running total from start to finish this wouldn't work. But if should work stellar for grouped running totals.

    It's certainly an interesting thought - and might work in theory. I know in full .NET, you can tap into APM to launch multiple threads to bring all of this together:

    I don't have any clue as to how to tap into that in SQLCLR (or if it's even "allowed")

    The main thing I see as an issue is that unless you're VERY efficient - finding out how to cut this into pieces might take longer than just diving in an doing it

    (for example if you had a whole bunch of small groups).

    Of course - you might have some better ideas in mind on that matter. Torture the code as you see fit. I would like to get it to a point where it doesn't have to

    absorb the entire data set into memory before it starts streaming it back - but I haven't found any better way to access the data than this.

    Just let us know if you find something.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • BTW, wouldn't the RowID go increasing by date in this test case?

    98% of the time, you wouldn't run into a situation where RowID 9876 is an earlier date than RowID 5000. Right?

    Christopher Ford

  • Matt Miller (2/18/2008)

    ...

    Check your inbox....

    (0 row(s) affected) :laugh:

    Is it a public link ? :Whistling:

    TheSQLGuru: From what I've been told - didn't have time yet to play

    with CLR that much:blush: - XP's are _the_ first kind of sqlerver addons to be

    replaced by CLR objects.

    We don't have xps installed, hence the testing gap 😉

    I have fooled around with some UDF replacements such as calculating

    date ranges, .... and I've been amazed how fast this works on CLR.:cool:

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Nice....

    I did it...copying code from Matt in order to post time differences.

    used StoredProcedure.

    CLR Table Valued Functions or User Defined Functions cannot pipe results directly to the client, so you must wait until the results are all filled up in your result set before getting sent back to the client.

    The CLR Store Procedure can take advantage of the SqlPipe object to stream results directly back to the client in realtime.

    Obviously, there's some advantages and disadvantages...one of them being the fact you can't use Bulk Insert type commands (SELECT ... INTO) to speedily bust out that table...

    So, we'll see how fast it works with an Insert...

    The returning of the top 100 rows is under a second though haha =) ofcourse, it's not processing a million rows doing that, kind of cheating.

    anyway...results coming soon...

    Christopher Ford

  • Bah, forget it...

    It takes too long to run streaming 1 million rows to the result pane...or doing an insert.

    However, this code should make a great learning snippet for anyone wanting to do CLR Stored Procedures.

    For small result sets, this is incredibly fast. So you'd have to pass parameters of some sort....which...I'm lacking any parameters, so...nevermind, this would suck for learning. =) Doesn't show you how to work with the parameters.

    However, I know I can make Matt's version of the Table Valued Function faster, so I'll tackle that soon. =) I thought this would do it...but it can't compete.

    However, I did find a bug, I think...maybe it's by design, but BOL doesn't tell me that what I got was what it was supposed to do... If I do:

    pipe.SendResultsStart(row);

    to signal the start of the result set, it "skips" that row...to work around it, you have to do:

    pipe.SendResultsStart(row);

    pipe.SendResultsRow(row);

    However, for smaller result sets, say, 100,000 rows? It works great. =)

    Here's the code:

    using System;

    using System.Data;

    using System.Data.SqlClient;

    using System.Data.SqlTypes;

    using Microsoft.SqlServer.Server;

    namespace CFord.SQLCLR

    {

    public class TestRunningBalance

    {

    // Set User-Friendly Column ordinal position

    readonly static int FIELD_rowNum = 0;

    readonly static int FIELD_AccountID = 1;

    readonly static int FIELD_Date = 2;

    readonly static int FIELD_Amount = 3;

    readonly static int FIELD_GrpAmount = 4;

    [Microsoft.SqlServer.Server.SqlProcedure]

    public static void RunningBalance()

    {

    using (SqlConnection cn = new SqlConnection("context connection=true"))

    {

    cn.Open();

    string sql = @"SELECT TOP(100000) RowNum, AccountID, Date, Amount FROM dbo.JBMTestMerry ORDER BY accountID, Date";

    SqlCommand cmd = new SqlCommand(sql, cn);

    SqlDataReader reader = cmd.ExecuteReader();

    SqlPipe pipe = SqlContext.Pipe;

    SqlMetaData rowNumMeta = new SqlMetaData("rn", SqlDbType.Int);

    SqlMetaData accountIDMeta = new SqlMetaData("AccountID", SqlDbType.Int);

    SqlMetaData dateMeta = new SqlMetaData("Date", SqlDbType.DateTime);

    SqlMetaData amountMeta = new SqlMetaData("Amount", SqlDbType.Money);

    SqlMetaData grpBalMeta = new SqlMetaData("grpBal", SqlDbType.Money);

    SqlMetaData[] rowMetaData = new SqlMetaData[] { rowNumMeta, accountIDMeta, dateMeta, amountMeta, grpBalMeta };

    bool firstRow = true;

    int prvAccountID = 0;

    Decimal grpBal = 0;

    while (reader.Read())

    {

    int AccountID = (int)reader[FIELD_AccountID];

    decimal Amount = (decimal)reader[FIELD_Amount];

    SqlDataRecord row = new SqlDataRecord(rowMetaData);

    row.SetInt32(FIELD_rowNum, (Int32)reader[FIELD_rowNum]);

    row.SetInt32(FIELD_AccountID, (Int32)reader[FIELD_AccountID]);

    row.SetDateTime(FIELD_Date, (DateTime)reader[FIELD_Date]);

    row.SetDecimal(FIELD_Amount, (Decimal)reader[FIELD_Amount]);

    if (firstRow == true)

    {

    prvAccountID = AccountID;

    }

    if (AccountID == prvAccountID)

    {

    grpBal += Amount;

    row.SetDecimal(FIELD_GrpAmount, grpBal);

    }

    else

    {

    grpBal = 0 + Amount;

    row.SetDecimal(FIELD_GrpAmount, grpBal);

    }

    if (firstRow == true)

    {

    pipe.SendResultsStart(row);

    pipe.SendResultsRow(row); //BUG -- If you don't have this, first row is skipped!!

    firstRow = false;

    }

    else

    {

    pipe.SendResultsRow(row);

    }

    prvAccountID = AccountID;

    }

    reader.Close();

    pipe.SendResultsEnd();

    }

    }

    }

    };

    Throw that in a C# database solution, add a new file (stored procedure), give it a name, Copy and paste code, right click solution, deploy to your database.

    Execute:

    EXEC RunningBalance

    Oh, you must run Matt's version of the table creation script OR change T-SQL code in the CLR. =)

    Christopher Ford

  • Christopher Ford (2/19/2008)...

    pipe.SendResultsStart(row);

    pipe.SendResultsRow(row);

    Actualy I'd remove the "else" and move the "end if" upward 🙂 in stead of adding the extra pipe.SendResultsRow(row);

    And I'm happy to inform you that the sp caused my x64 clusterd instance to crash :sick::blink::doze::crazy::ermm::arrow::w00t::w00t:

    I've added the partial errorlog txt file.

    That may point to another reason to avoid CLR in some cases. :ermm:

    I've restarted the instance and re-executed the sp and it crashed again after returning 862,625 rows. (took 1'05")

    Msg 6533, Level 16, State 49, Procedure csp_RunningBalance, Line 0

    AppDomain DPartitioning.dbo[runtime].6 was unloaded by escalation policy to ensure the consistency of your application. Out of memory happened while accessing a critical resource.

    There is insufficient system memory to run this query.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZ - CLR is known to have some memory management/usage issues. Perhaps using the -g parameter you could start up sql server with more memtoleave (512 perhaps?). Not sure if this is applicable on a 64 bit set up tho. This is an example of yet another thing that MS should fix/stabilize in a 2005 SP3 release.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • ALZDBA (2/19/2008)


    Matt Miller (2/18/2008)

    ...

    Check your inbox....

    (0 row(s) affected) :laugh:

    Is it a public link ? :Whistling:

    TheSQLGuru: From what I've been told - didn't have time yet to play

    with CLR that much:blush: - XP's are _the_ first kind of sqlerver addons to be

    replaced by CLR objects.

    We don't have xps installed, hence the testing gap 😉

    I have fooled around with some UDF replacements such as calculating

    date ranges, .... and I've been amazed how fast this works on CLR.:cool:

    Ooops - sorry about that.

    This was stuff we posted in an old thread. I forgot to post the link. that being said - here's the CLR code I sent to him.

    The best performing ones are the scalar functions (match/replace/findnth). I built some table-valued equivalents which can be made to return a LOT of info, but they suffer performance-wise due to their use in CROSS APPLY (which from what little I can tell is CROSS APPLY's fault, and not CLR's).

    RunningTotals is also in there.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • ALZDBA (2/19/2008)


    Christopher Ford (2/19/2008)...

    pipe.SendResultsStart(row);

    pipe.SendResultsRow(row);

    Actualy I'd remove the "else" and move the "end if" upward 🙂 in stead of adding the extra pipe.SendResultsRow(row);

    And I'm happy to inform you that the sp caused my x64 clusterd instance to crash :sick::blink::doze::crazy::ermm::arrow::w00t::w00t:

    I've added the partial errorlog txt file.

    That may point to another reason to avoid CLR in some cases. :ermm:

    I've restarted the instance and re-executed the sp and it crashed again after returning 862,625 rows. (took 1'05")

    Msg 6533, Level 16, State 49, Procedure csp_RunningBalance, Line 0

    AppDomain DPartitioning.dbo[runtime].6 was unloaded by escalation policy to ensure the consistency of your application. Out of memory happened while accessing a critical resource.

    There is insufficient system memory to run this query.

    That's what happens when you code at 3:30am =) I am pretty sure there are a couple of things I can do to make this a little nicer now that I look at it.

    I can move the StartResultsRow up out of the loop and various other tiny things, however, that doesn't solve the problem of the interface between the CLR and SQL 2005.

    Also, SqlPipe can only send 1 row at time. There should be some logic put into any CLR code that uses SqlPipe to check if SqlPipe "IsSendingRow" I think that's the property. Because you have to wait for SqlPipe to finish sending the row before moving on.

    That was a very interesting discovery.

    Also, It needs quite a bit more work, but... you can do something like this for Multi-Threading:

    public class ThreadingSample

    {

    public static readonly object a = new object();

    public static readonly object b = new object();

    [SqlProcedure]

    public static void Method1()

    {

    SqlContext.Pipe.Send("Hello from Method1");

    }

    [SqlProcedure]

    public static void Method2()

    {

    SqlContext.Pipe.Send("Hello from Method2");

    }

    public static void Main()

    {

    Thread thread1 = new Thread(new ThreadStart(Method1));

    Thread thread2 = new Thread(new ThreadStart(Method2));

    thread1.Start();

    thread2.Start();

    thread1.Join();

    thread2.Join();

    }

    };

    I'll have to iron out the details, but after what I've seen from using the pipe object, I don't think I'm going to be pleased with the results. =)

    Christopher Ford

  • Thanks 🙂

    OT bis

    Is there an "common CLR pittfals" link somewhere ?

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 15 posts - 151 through 165 (of 250 total)

You must be logged in to reply to this topic. Login to reply