April 8, 2011 at 7:06 am
Hi,
I've got a problem which you might be able to assist me with.
I've got this C# CLR implemented as a stored procedure in one of our databases. That database server is really fast (120GB RAM and 24 Intel Xeon 2,4GHz CPU's). The CLR basically:
1. Reads a table of 5million rows using a datareader (1.000.000 rows at a time)
2. Does some calculations for each row
3. Stores the results to another table using SQL Bulk Insert (100.000 records at a time)
I've done some performance testing and the "bottleneck" seems to be the calculations. The strange thing though, which I can't wrap my head around, is that when I run the (almost) exact same code from a UnitTest the calculations are about 50x faster. My local machine is fast (12GB RAM with 1 Intel Xeon 3,07GHz CPU).
The even stranger thing is that loading the data (step 1 above) is also faster on my local machine than on the SQL server through the SQL CLR stroed procedure.
Do you know what the problem might be or if this is expected behaviour.
Best wishes
-Tomas Bergvall
April 8, 2011 at 7:27 am
are you taking the network data transfer into consideration?
on your local machine, the million rows go directly into memory; on the remote server, they have to travel over the network, right?
Lowell
April 8, 2011 at 7:45 am
Hi,
I understand what you are saying and I would have thought that it could have been a data transfer problem. But the data is stored on the remote db server and the slower CLR code is executed on that server so there should not be any data transfer issues. It there were I would have asked the opposite question (Why is the code slower on my machine than on the remote server? 🙂 ).
Secondly, I only time the actual calculation step not the data loading or writing. Or I time those as well but that is not where the bottleneck is located.
-Tomas
April 8, 2011 at 9:36 am
I do not know what the proc is doing so I may be completely off the grid here...but I can't get over the assumption that you should even be in a CLR proc for this. Why would you implement a process doing that much heavy lifting in a CLR proc? Was T-SQL not able to get you where you needed to go in this case? If your calculations are extensive I could see having a CLR Aggregate or string manipulation function, but there is a lot of overhead built into the context switch between the CLR and the DB Engine when it comes to moving data.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
April 12, 2011 at 5:36 am
Hi,
I guess I probably could have used TSQL instead of the CLR in this scenario but I still think there is something lurking in background somewhere which makes the code slower than it really should be. And I agree with you that moving large quantities of data probably would cause an overhead but I don't think that's the issue here.
In step 1 I load the data (say 100.000 rows) which I guess is stored in memory as a simple list of objects. In the next step the code loops through this list and does a bit of calculations for each row. This last step is what takes too much time on the SQL Server compared to my local machine. Do you know if there might be swapping issue at hand or if the list created is stored in memory as it is on my local machine (when debugging the Visual Studio project).
-Tomas
April 12, 2011 at 7:42 am
You are in an area that not a lot of people go. With all due respect, this scenario is exactly why DBAs who do not have close working relationships with, and trust their development teams resist turning on the SQL CLR. I think you should strongly consider rewriting your procedure using T-SQL wherever possible.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
April 13, 2011 at 7:55 am
I understand that but I don't like to leave things hanging without a reasonable explanation as to why this behaviour is encountered. It still seems strange to me to say that I need to switch to using TSQL because of some strange notion that CLRs are bad. If we didn't use CLRs we wouldn't be able to use regular expressions, right.
I would relly appreciate suggestions as to how I would go about and investigate what the problems might be.
April 15, 2011 at 3:15 pm
Did you try making your batch size smaller when loading up your data in the CLR and processing it? I could potentially see you running into memory issues causing disk swapping running it within MSSQL versus running it stand alone. If that is the issue then loading less data into memory at a time might resolve the issue.
If you could post the code showing how you are loading your data and then consuming it in the CLR it might be helpful in figuring out what the issue is as well.
April 15, 2011 at 3:41 pm
Heh, Incindium brings up a good point. What's the max memory setting for the sql engine on the server? That's a pretty beefy local box. You might be getting away with 8-10 gigs of memory on your local CLR while it might be restricted to 1 or 2 (or competing with the OS and ending up in the swapfile) on the db server.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 15, 2011 at 8:32 pm
Fine...into rabbit hole 🙂 What is your MemToLeave setting on both instances? -g on your startup options.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
April 15, 2011 at 11:54 pm
opc.three (4/15/2011)
Fine...into rabbit hole 🙂 What is your MemToLeave setting on both instances? -g on your startup options.
Awww, OPC, if I didn't know better, I'd say you weren't a DBA! C'mon, man, we all chase white rabbits... although usually *we're* the the ones screaming "We're LATE!"
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 16, 2011 at 8:02 am
Craig Farrell (4/15/2011)
opc.three (4/15/2011)
Fine...into rabbit hole 🙂 What is your MemToLeave setting on both instances? -g on your startup options.Awww, OPC, if I didn't know better, I'd say you weren't a DBA! C'mon, man, we all chase white rabbits... although usually *we're* the the ones screaming "We're LATE!"
Eh...I couldn't help myself...I guess I know what that says about me now...I figure the worst that can happen is I'll be capable of yelling a bit more intelligently at the next person I run across that abuses the SQL CLR 😀
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
April 16, 2011 at 2:11 pm
opc.three (4/16/2011)
Craig Farrell (4/15/2011)
opc.three (4/15/2011)
Fine...into rabbit hole 🙂 What is your MemToLeave setting on both instances? -g on your startup options.Awww, OPC, if I didn't know better, I'd say you weren't a DBA! C'mon, man, we all chase white rabbits... although usually *we're* the the ones screaming "We're LATE!"
Eh...I couldn't help myself...I guess I know what that says about me now...I figure the worst that can happen is I'll be capable of yelling a bit more intelligently at the next person I run across that abuses the SQL CLR 😀
I'm liking you more and more... 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
April 18, 2011 at 5:41 am
-> Incindium
I tried running the code with smaller batch sizes with the following results
[Num rows][local time][db server time]
500 0.12s 0.8s
1.000 0.20s 1.7s
10.000 1.6s 18.0s
100.000 7.5s 290s
This would indicate to me that it might be something going on with the swap file although that does not explain the time diffs in the lower regions.
Do you have any suggestion of what I could do to monitor the potential swap issue?
The code is a bit comprehensive, but I could try to do an excerpt of the code if it's needed.
->Craig
The max memory setting is defined to 120GB on the DB server, where the total phusical memory is 131GB.
A thing which is a bit strange to me is that the SQL server uses (or has allocated) 120 GB memory on the DB server. I'm not sure that the SQL engine can use the memory allocated since the size of that usually fluctuated during the CLR execution. Could that be the reason of the potential disk swaps.
-> opc.three
I'm not entirely sure of how to view the MemToLeave setting but I tried this post http://www.johnsansom.com/index.php/2009/03/sql-server-memory-configuration-determining-memtoleave-settings/#axzz1JsE4avf6
and got the following result from the SQL. Is it possible to view that setting in another way?
Total avail mem, KBMax free size, KB
8442427040 6574744000
April 18, 2011 at 6:35 am
This seems like a fairly inefficient method of doing this.
It may just be the way you're describing it, but if you're going to do your calculations in .Net code, you should be doing this kind of work in a stream/pipeline rather than explicitly loading batches and bulk inserting them - an SSIS data transformation pipeline may work better for you as it manages the memory allocation dynamically as rows flow through the data flow and is generally very fast at this kind of work.
Loading and dumping large amounts of memory within the CLR could run into all sorts of issues. I'm assuming your SQL Server is 64bit?
Doing this work in CLR will also mostly preclude any use of parallelism as it executes within an individual thread, so it doesn't really matter how many CPU cores exist in the server.
Viewing 15 posts - 1 through 15 (of 44 total)
You must be logged in to reply to this topic. Login to reply