C# CLR performance

  • Hi,

    I've got a problem which you might be able to assist me with.

    I've got this C# CLR implemented as a stored procedure in one of our databases. That database server is really fast (120GB RAM and 24 Intel Xeon 2,4GHz CPU's). The CLR basically:

    1. Reads a table of 5million rows using a datareader (1.000.000 rows at a time)

    2. Does some calculations for each row

    3. Stores the results to another table using SQL Bulk Insert (100.000 records at a time)

    I've done some performance testing and the "bottleneck" seems to be the calculations. The strange thing though, which I can't wrap my head around, is that when I run the (almost) exact same code from a UnitTest the calculations are about 50x faster. My local machine is fast (12GB RAM with 1 Intel Xeon 3,07GHz CPU).

    The even stranger thing is that loading the data (step 1 above) is also faster on my local machine than on the SQL server through the SQL CLR stroed procedure.

    Do you know what the problem might be or if this is expected behaviour.

    Best wishes

    -Tomas Bergvall

  • are you taking the network data transfer into consideration?

    on your local machine, the million rows go directly into memory; on the remote server, they have to travel over the network, right?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Hi,

    I understand what you are saying and I would have thought that it could have been a data transfer problem. But the data is stored on the remote db server and the slower CLR code is executed on that server so there should not be any data transfer issues. It there were I would have asked the opposite question (Why is the code slower on my machine than on the remote server? 🙂 ).

    Secondly, I only time the actual calculation step not the data loading or writing. Or I time those as well but that is not where the bottleneck is located.

    -Tomas

  • I do not know what the proc is doing so I may be completely off the grid here...but I can't get over the assumption that you should even be in a CLR proc for this. Why would you implement a process doing that much heavy lifting in a CLR proc? Was T-SQL not able to get you where you needed to go in this case? If your calculations are extensive I could see having a CLR Aggregate or string manipulation function, but there is a lot of overhead built into the context switch between the CLR and the DB Engine when it comes to moving data.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi,

    I guess I probably could have used TSQL instead of the CLR in this scenario but I still think there is something lurking in background somewhere which makes the code slower than it really should be. And I agree with you that moving large quantities of data probably would cause an overhead but I don't think that's the issue here.

    In step 1 I load the data (say 100.000 rows) which I guess is stored in memory as a simple list of objects. In the next step the code loops through this list and does a bit of calculations for each row. This last step is what takes too much time on the SQL Server compared to my local machine. Do you know if there might be swapping issue at hand or if the list created is stored in memory as it is on my local machine (when debugging the Visual Studio project).

    -Tomas

  • You are in an area that not a lot of people go. With all due respect, this scenario is exactly why DBAs who do not have close working relationships with, and trust their development teams resist turning on the SQL CLR. I think you should strongly consider rewriting your procedure using T-SQL wherever possible.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I understand that but I don't like to leave things hanging without a reasonable explanation as to why this behaviour is encountered. It still seems strange to me to say that I need to switch to using TSQL because of some strange notion that CLRs are bad. If we didn't use CLRs we wouldn't be able to use regular expressions, right.

    I would relly appreciate suggestions as to how I would go about and investigate what the problems might be.

  • Did you try making your batch size smaller when loading up your data in the CLR and processing it? I could potentially see you running into memory issues causing disk swapping running it within MSSQL versus running it stand alone. If that is the issue then loading less data into memory at a time might resolve the issue.

    If you could post the code showing how you are loading your data and then consuming it in the CLR it might be helpful in figuring out what the issue is as well.

  • Heh, Incindium brings up a good point. What's the max memory setting for the sql engine on the server? That's a pretty beefy local box. You might be getting away with 8-10 gigs of memory on your local CLR while it might be restricted to 1 or 2 (or competing with the OS and ending up in the swapfile) on the db server.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Fine...into rabbit hole 🙂 What is your MemToLeave setting on both instances? -g on your startup options.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (4/15/2011)


    Fine...into rabbit hole 🙂 What is your MemToLeave setting on both instances? -g on your startup options.

    Awww, OPC, if I didn't know better, I'd say you weren't a DBA! C'mon, man, we all chase white rabbits... although usually *we're* the the ones screaming "We're LATE!"


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (4/15/2011)


    opc.three (4/15/2011)


    Fine...into rabbit hole 🙂 What is your MemToLeave setting on both instances? -g on your startup options.

    Awww, OPC, if I didn't know better, I'd say you weren't a DBA! C'mon, man, we all chase white rabbits... although usually *we're* the the ones screaming "We're LATE!"

    Eh...I couldn't help myself...I guess I know what that says about me now...I figure the worst that can happen is I'll be capable of yelling a bit more intelligently at the next person I run across that abuses the SQL CLR 😀

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (4/16/2011)


    Craig Farrell (4/15/2011)


    opc.three (4/15/2011)


    Fine...into rabbit hole 🙂 What is your MemToLeave setting on both instances? -g on your startup options.

    Awww, OPC, if I didn't know better, I'd say you weren't a DBA! C'mon, man, we all chase white rabbits... although usually *we're* the the ones screaming "We're LATE!"

    Eh...I couldn't help myself...I guess I know what that says about me now...I figure the worst that can happen is I'll be capable of yelling a bit more intelligently at the next person I run across that abuses the SQL CLR 😀

    I'm liking you more and more... 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • -> Incindium

    I tried running the code with smaller batch sizes with the following results

    [Num rows][local time][db server time]

    500 0.12s 0.8s

    1.000 0.20s 1.7s

    10.000 1.6s 18.0s

    100.000 7.5s 290s

    This would indicate to me that it might be something going on with the swap file although that does not explain the time diffs in the lower regions.

    Do you have any suggestion of what I could do to monitor the potential swap issue?

    The code is a bit comprehensive, but I could try to do an excerpt of the code if it's needed.

    ->Craig

    The max memory setting is defined to 120GB on the DB server, where the total phusical memory is 131GB.

    A thing which is a bit strange to me is that the SQL server uses (or has allocated) 120 GB memory on the DB server. I'm not sure that the SQL engine can use the memory allocated since the size of that usually fluctuated during the CLR execution. Could that be the reason of the potential disk swaps.

    -> opc.three

    I'm not entirely sure of how to view the MemToLeave setting but I tried this post http://www.johnsansom.com/index.php/2009/03/sql-server-memory-configuration-determining-memtoleave-settings/#axzz1JsE4avf6

    and got the following result from the SQL. Is it possible to view that setting in another way?

    Total avail mem, KBMax free size, KB

    8442427040 6574744000

  • This seems like a fairly inefficient method of doing this.

    It may just be the way you're describing it, but if you're going to do your calculations in .Net code, you should be doing this kind of work in a stream/pipeline rather than explicitly loading batches and bulk inserting them - an SSIS data transformation pipeline may work better for you as it manages the memory allocation dynamically as rows flow through the data flow and is generally very fast at this kind of work.

    Loading and dumping large amounts of memory within the CLR could run into all sorts of issues. I'm assuming your SQL Server is 64bit?

    Doing this work in CLR will also mostly preclude any use of parallelism as it executes within an individual thread, so it doesn't really matter how many CPU cores exist in the server.

Viewing 15 posts - 1 through 15 (of 44 total)

You must be logged in to reply to this topic. Login to reply