The problem is not really in the cursor itself. It is in the fact that operations are not set-oriented. Doing 1 million inserts is not the same at all as doing a single insert based on a select that returns 1 million rows and insert them in a single operation.
I converted recently a transact-sql batch that was doing about 8 million inserts. The execution time was about 8 hours. By making a big query using derived tables and case statement to apply some conversion/extraction logic in the query, and union all to bring together rows obtains under disjoin logic context, this time reduced from 8 hours to 30 minutes (a 16 to 1 improvement). By using such query it was also possible to reduce logging by doing a select INTO.
Even without doing a SELECT INTO, SQL Server optimize log writes operations. It seems that there is a lot less logging required when doing such massive single insert than when doing it row by row.
Cursors are an easy way to describe a job, but the net result is that it ends in row by row reads & writes. For example a single row read use almost all the time suitable indexes, it they exists, which may seems good. But it is very not good if you finally read all the rows from the table this way. When you match a lot of rows using a join SQL Server may evaluate that using an index is not good, and the match is going to be done with a better suited joining algorithm for this type of job. There is another optimization gain possible there.
The only problem with this approach, is that you have to work harder to define the big “do it all” query.
After the pivot the results data would look like this...ID Result-- ----------------56 run, jump, scale
No, not really, it should look like:
ID Result-- ----------------56 run,jump,scale,
That's 15 characters long, and if the return statement was:
you could drop the comma at the end. That's assuming your string including the last ',' character was 20 or under characters long.
I've had similar experiences with improving cursors. I had to look at someone's report because they weren't getting good data from it anymore. I ran the existing proc on my machine just to see what it would produce. 5 minutes later I hadn't gotten a result, and I had started to read what they were doing and understood why they weren't getting fast results, but still didn't understand what they were trying to do. 10 minutes after that I understood what they were trying to do, their proc is still running. I open up a new window and start getting answers piecemeal.
If I execute this select I'll get all the users they are interested in. Yep. If I join this information I'll find out the offices they are involved in. Yep. Hmmm, how am I going to include all the related offices. Oh, this should do it, but this could loop indefinitely, before I start this, put a different stop in the loop. (Came in handy when I blew the data comparison later and caused the infinite loop.)
Anyway, by the end of the day, I had reproduced the report information (including the information the original proc was dropping.) that they wanted. I left my workstation up overnight because I still hadn't gotten an answer from my machine in 6 hours of running.
Next morning I killed the 20 hour process, wrote up a new proc and got a report in 30 seconds. I heard the old processes ran in 4 hours on the faster production machine for 1 DB and 2 hours for another DB. (Custom processes for each DB.) My proc gets reports from both DBs in 28 seconds.
Happend to see this while browsing . thought it might be relevant here. have a look
I hope there are not many people hitting this table. This may be OK as long as it is a nightly run that you know you are the only code running against it.
However long it runs, it will be locked during this process. Although fast, for databases with a lot of people expecting to update it, it would be unacceptable unless you can get it to run in less than 4 seconds (you last said 26 seconds I believe) and lock timeout's are set well above that. Speed is not always the only consideration unless you are running exclusively! In which case I would put a table lock hint on it to speed it up even more. I would recommend taking a little longer time to make sure you don't have a table lock for a half minute.
Here is how a highly available database would look. I work on databases with hundreds of thousands of users on databases unconceivable large to most. At my job location, we don't even have a nightly processing time (global app).
Whenever I read posts on this and most other sites, I have to keep in mind that they have a low concurrency of users. As a user base grows, most will have a an increasing number of problems they can't identify over time.
The following code is focused on removing cursors, actually a lot of code should be added for error checking, but for simplicity and keeping to the topic it is left out. While loops in our environment are a necessary evil to keep the number of rows being updated down to a small number so they can complete and release locks within 4 seconds (our apps rule; should actually be lower; perhaps 3). Think availablity first, then speed. At my work location, we even have variations of this that loop through rows in blocks of 5, 10, or whatever higher number still allows us to complete in less than 4 seconds. As you can see, it is a completely different mind set.
Declare @tmp_id intDeclare @prev_id intDeclare @tmp_values varchar(20)
Set LOCK_TIMEOUT = 12000 -- assuming all code in app is tested to have -- no update take longer than 4 seconds Set @prev_id = -1 -- assumes all ID's are positive in nature
While Exists ( SELECT TOP 1 @tmp_id = id from DB.dbo.OutPut_tbl WITH(NOLOCK) Where id > @prev_id -- retrieve next row > than the previous order by id -- key data ; preferable clustered ; -- order by id to maintain -- a reference point )BEGIN --Direct column pivot into a variable -- I don't know enough about the column_out field; but lets hope we never -- increase in size greater than 20; declarations and table field would -- need to be changed if that -- was possible. SELECT @tmp_values = @tmp_values + convert(varchar(20), column_data) + ',' FROM DB.dbo.many_tbl WITH(NOLOCK) WHERE id = @tmp_id UPDATE DB.dbo.OutPut_tbl WITH(ROWLOCK) -- assuming that this will take -- less than 4 seconds -- otherwise create another -- while exists loop SET column_out = Left(@tmp_values,20) WHERE id = @tmp_id
SET @prev_id = @tmp_id -- move our pointer to the next row
END -- While Exists