Appalling performance of CLR SqlBulkCopy

  • In my CLR, I first tried a DataAdapter - when writing back to the database the performance is awful. 8000 rows take over a minute to write. So I've changed to using SqlBulkCopy - and the advantage is minimal - just shaved maybe 10% off the time.

    The CLR performs some very complex processing (very quickly) and writes it back to a table in the calling T-SQL. I've tried all the usual tweaks - drop all indexes before the bulkCopy, set recovery mode to Simple, send the data with an Order BY - But these 8000 rows, which only take a fraction of a second to create in the CLR still take over a minute to write back to the SQL Table.

    Has anyone got any performance tips for writing data back from a CLR?

    EDIT: Performance on development 2008 was great about 1 sec (still targeting .NET 3.5 libraries), but moved to production (SQL 2005) and performance is bad bad bad, though since the production server is a 16-processor monster with 32Gb Ram, the actual .NET processing is quicker its only the writing back to SQL Server thats wading-through-treacle slow.

  • hi there,

    please, make this google saved search:

    sqldatareader vs dataset performance

    you could try sqldatareader.

    it uses a server-side cursor.

    you have to close your sqldatareader object once you are finished with it.

    scrolling of a datareader is different from a dataset.

    dataset are in-memory databases and disconnected from the data source.

    sqldatareader is not disconnected from its data source.

    sqldatareader are a little like recordsets with a server-side cursor used in classic ado.

    you could also do these google saved searches:

    oledbdatareader vs sqldatareader

    oledbdatareader vs dataset

    i think you have to know in advance the data types of the columns you access with a datareader object, but you could also access them by index position and then cast to the required data type.

    you still have different techniques to try...

    good luck,

    tonci korsano

  • From my experience, the CLR itself is always going to be disadvantaged when trying to write to the DB (the CLr best practices advocate avoiding writebacks for this reason). The best advice I can give you is - don't do the write-back within CLR. Have the CLR generate the results, and insert them using "regular" T-SQL.

    Insert mytargettable(mycolumns)

    Exec MyCLRGenerateAndSelectProc

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I guess my suggestion would be to dump the CLR altogether and do the data layer calculations in the data layer.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the suggestions but, ...

    well I'll just have to eat crow on this one :blush:

    Turns out I had my timing code in the wrong place in the CLR, leading to the mistaken conclusion that SqlBulkCopy sucks.

    I was WRONG. SqlBulkCopy never registers above 0.2 milliseconds, and more frequently its 0.03 milliseconds.

    The thing taking the time is sorting a DataTable in .NET the larger the table gets, the exponentially longer the sort takes. Haven't figured a workaround yet, but just letting you know SqlBulkCopy is off the hook.

  • Just for completeness here are a few things to try if your CLR performance is unacceptable.

    1) Despite the title of this thread, DO use SqlBulkCopy - its much faster than the DataAdapter.Update() method - which does an update for every changed datarow.

    2) Do sorting in T-SQL - I used BulkCopy to write out the datatable to a pre-created temp table, which had a clustered primary key on my sort field

    3) Read in a DataReader if you don't need to change the data - so now I read the temp table ordered by its primary key, and process the contents I do all my processing into a new dataTable which uses appropriate PrimaryKey settings

    4) Do Joins in T-SQL - again bulkcopy back to the T-SQL, and to get the join results, use a DataAdapter with SqlCommand that does the appropriate JOIN

    5) You can do most things T-SQL with a DataAdapter's SqlCommand - e.g. DELETE, TRUNCATE, INSERT INTO, call other stored procedures ... anything set-based, get Sql Server to do it

    My 60-second proc is now around 20 sec .. and I still have a couple of things to try ..

  • Nice tips, thanks for posting back!

    Just curious, can you describe (without disclosing anything confidential of course) the "complex processing" you are doing in your CLR object that made you decide to use the CLR instead of native T-SQL code?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • We use the CLR because of the complexity of the processing. It probably could be written in a cursor - but it would be a maintenance headache, especially as there are more .NET skills here than T-SQL. Also it has to run in a 2-minute slot, and cursor performance may not cut it.

    The process is designed to generate a sequence of records using several user-definable criteria. Each rule is given a priority and can force a grouping-together (resulting in clumps of similar records) or a spreading apart (to get ratios of 1:N patterns). On top of this there are priority 'queue-jumping' rules to be applied after.

    It actually piggy-backs on antoher SQL Server Agent Job, which already gets most of the data in temporary tables, so it seemed the logical place to call the CLR process, as the context connection can access these temporary tables. (instead of creating a new .NET process that calls SQL)

    And I now have performance down to 1.2 seconds.

  • Thanks, Tom. Granted, I have little to none of the nitty-gritty details but from what you have told us it sounds like it could have been done using T-SQL. I know one person on this thread salivating over the possibility of besting 1.2 seconds in terms of performance while using less CPU, memory and I/O than the SQLCLR object. Either way however, it appears you would have a code-maintenance challenge on your hands given the complexity.

    I regularly see the choice of technology influenced by the technical strengths of existing team members, which is obviously a real concern to the success of any project. I am happy you have something maintainable that has reached an acceptable level of performance. Thanks again for posting those tips.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • curtis2325 (5/11/2012)


    Thanks for shearing nice post are rally useful thanks

    <span id="_ctl5_ctlTopic_ctlPanelBar_ctlTopicsRepeater__ctl22_lblPostSignature"><b><a href="http://www.textmessagespysoftware.com" target="_blank" class="SmlLinks">text message spy software</a></b>

    <b><a href="http://www.textmessagespysoftware.com/spy-software/spy-girlfriends-phone-spy-software/" target="_blank" class="SmlLinks">spy on girlfriends phone</a></b>

    <b><a href="http://www.textmessagespysoftware.com/cell-phone-spy-2/spy-text-messages-installing-software/" target="_blank" class="SmlLinks">spy on text messages without installing software</a></b>

    <b><a href="http://www.textmessagespysoftware.com/cell-phone-spy-2/spy-text-messages-installing-software/" target="_blank" class="SmlLinks">spy on text messages</a></b>

    <b><a href="http://www.textmessagespysoftware.com/spy-software/blackberry-text-message-spy-software/" target="_blank" class="SmlLinks">blackberry text message spy software</a></b>

    <b><a href="http://www.textmessagespysoftware.com/spy-software/nokia-text-message-spy-software/" target="_blank" class="SmlLinks">nokia text message spy software</a></b>

    <b><a href="http://www.textmessagespysoftware.com/spy-software/android-text-message-spy-software/" target="_blank" class="SmlLinks">Android text message spy software</a></b>

    </span>

    Reported as SPAM

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • hi there,

    your suggestions are all fine.

    bulk copy is always fast because it doesn't log (it is like saying it doesn't use the transaction log). so, if the issue is not rolling back from a heavy amount of inserts then sqlbulkcopy, bcp, etc is always an answer.

    datareaders are always faster than dataadapters with datasets, etc. so, if your issue is speed then datareaders are a solution, if they can be used in your solution.

    everything looks great, but i wanted to add that in the same way you send an update or "insert into" statements to a dataadapter with a command object, you can also send a stored procedure call, you attach the connection to the command object, plus the procedure and its parameters. later, you use this command object in your (maybe) fill method call of the dataadapter.

    i tested this procedure call with command and dataadapter objects for sql server and oracle, and i can guarantee it will give performance boost over a sql batch or dynamic sql (at least most of the time).

    best regards,

    tonko.

  • Tonci Korsano (5/11/2012)


    hi there,

    your suggestions are all fine.

    bulk copy is always fast because it doesn't log (it is like saying it doesn't use the transaction log). so, if the issue is not rolling back from a heavy amount of inserts then sqlbulkcopy, bcp, etc is always an answer.

    That is completely untrue. All DML operations are logged. Data added using bulk-load tools are no exception.

    datareaders are always faster than dataadapters with datasets, etc. so, if your issue is speed then datareaders are a solution, if they can be used in your solution.

    everything looks great, but i wanted to add that in the same way you send an update or "insert into" statements to a dataadapter with a command object, you can also send a stored procedure call, you attach the connection to the command object, plus the procedure and its parameters. later, you use this command object in your (maybe) fill method call of the dataadapter.

    i tested this procedure call with command and dataadapter objects for sql server and oracle, and i can guarantee it will give performance boost over a sql batch or dynamic sql (at least most of the time).

    best regards,

    tonko.

    Wow, you threw out an "always" and even a "guarantee". I suspect the OP will need to test things on their system hardware with their data and their workload to know for sure. "It depends" usually rules the discourse here.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • hi there,

    here is the answer --> https://www.google.com/#hl=en&sa=X&psj=1&ei=70OtT53nCYKy8ASUwrWXDQ&ved=0CBgQvwUoAQ&q=(sqlbulkcopy+OR+bcp)+(%22no+logging%22+OR+%22does+not+log%22)&spell=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=356e729b7a15c7f0&biw=1280&bih=891

    or just put in google.com

    (sqlbulkcopy OR bcp) ("no logging" OR "does not log")

    this is really a web search... even though it doesn't look like one

    i remember bcp or bulk copying from 1997 and sql server 6.5.

    i think it was already there for sql server 6.0, but i am not sure.

    i also remember this from sybase 11.02 and 11.5 and 1998 and 1999.

    truncate doesn't log either.

    there is truncate in sql server, sybase and oracle, as far as i can remember.

    i am sure relational dbs have all a type of truncate and bulk copy, but maybe with different names.

    sometimes logging is not required...

    if you are going to delete a table with millions of rows (meaning the entire table), and you won't need to rollback, then truncate can do that job for you under a second.

    but what is this i would really like?

    being young enough do not know any of this for those versions of sybase and sql server.

    it is like "do you remember this from sq server 6.0?"

    and then i could really say "what are you talking about? i am not old enough to have worked with anything like that!"

    well... too bad for me! 🙂

    have an excellent day,

    tonko.

  • Tonci Korsano (5/11/2012)


    hi there,

    here is the answer --> https://www.google.com/#hl=en&sa=X&psj=1&ei=70OtT53nCYKy8ASUwrWXDQ&ved=0CBgQvwUoAQ&q=(sqlbulkcopy+OR+bcp)+(%22no+logging%22+OR+%22does+not+log%22)&spell=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=356e729b7a15c7f0&biw=1280&bih=891

    or just put in google.com

    (sqlbulkcopy OR bcp) ("no logging" OR "does not log")

    this is really a web search... even though it doesn't look like one

    i remember bcp or bulk copying from 1997 and sql server 6.5.

    i think it was already there for sql server 6.0, but i am not sure.

    i also remember this from sybase 11.02 and 11.5 and 1998 and 1999.

    truncate doesn't log either.

    there is truncate in sql server, sybase and oracle, as far as i can remember.

    i am sure relational dbs have all a type of truncate and bulk copy, but maybe with different names.

    sometimes logging is not required...

    if you are going to delete a table with millions of rows (meaning the entire table), and you won't need to rollback, then truncate can do that job for you under a second.

    but what is this i would really like?

    being young enough do not know any of this for those versions of sybase and sql server.

    it is like "do you remember this from sq server 6.0?"

    and then i could really say "what are you talking about? i am not old enough to have worked with anything like that!"

    well... too bad for me! 🙂

    have an excellent day,

    tonko.

    Hate to tell you this, but truncate is logged.

    Run this code:

    CREATE TABLE dbo.Test (

    id INT

    );

    GO

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    )

    INSERT INTO dbo.Test (id)

    SELECT * FROM cteTally;

    GO

    SELECT * FROM dbo.Test;

    BEGIN TRANSACTION

    TRUNCATE TABLE dbo.Test;

    SELECT * FROM dbo.Test;

    ROLLBACK;

    SELECT * FROM dbo.Test;

    GO

    DROP TABLE dbo.Test;

    GO

  • Tonci Korsano (5/11/2012)


    hi there,

    here is the answer --> https://www.google.com/#hl=en&sa=X&psj=1&ei=70OtT53nCYKy8ASUwrWXDQ&ved=0CBgQvwUoAQ&q=(sqlbulkcopy+OR+bcp)+(%22no+logging%22+OR+%22does+not+log%22)&spell=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=356e729b7a15c7f0&biw=1280&bih=891

    or just put in google.com

    (sqlbulkcopy OR bcp) ("no logging" OR "does not log")

    this is really a web search... even though it doesn't look like one

    The first two links returned are for Sybase SQL Server, not Microsoft SQL Server. The third link is for Microsoft BULK INSERT and if you read it, it says "The Bulk Insert task does not log error-causing rows". That is much different than not logging anything to the transaction log.

    truncate doesn't log either.

    Also false. Truncate is a logged command. Try it out. You can see it can be rolled back.

    if you are going to delete a table with millions of rows (meaning the entire table), and you won't need to rollback, then truncate can do that job for you under a second.

    Not always. Truncate deallocates entire pages rather than deelting the individual rows on those pages, making it a faster option, but and on large tables or on slow systems truncate can still take more than "under a second".

    but what is this i would really like?

    being young enough do not know any of this for those versions of sybase and sql server.

    it is like "do you remember this from sq server 6.0?"

    and then i could really say "what are you talking about? i am not old enough to have worked with anything like that!"

    well... too bad for me! 🙂

    Please do not cite your age or the number of versions of a product you have worked with as a means to claim authority, you are only embarrassing yourself.

    have an excellent day

    You too 🙂

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply