Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Replacing Cursors and While Loops Expand / Collapse
Author
Message
Posted Monday, July 11, 2005 10:35 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, February 03, 2014 3:47 PM
Points: 12, Visits: 48

The problem is not really in the cursor itself.  It is in the fact that operations are not set-oriented.  Doing 1 million inserts is not the same at all as doing a single insert based on a select that returns 1 million rows and insert them in a single operation.

 

I converted recently a transact-sql batch that was doing about 8 million inserts. The execution time was about 8 hours. By making a big query using derived tables and case statement to apply some conversion/extraction logic in the query, and union all to bring together rows obtains under disjoin logic context, this time reduced from 8 hours to 30 minutes (a 16 to 1 improvement). By using such query it was also possible to reduce logging by doing a select INTO.

 

Even without doing a SELECT INTO, SQL Server optimize log writes operations.  It seems that there is a lot less logging required when doing such massive single insert than when doing it row by row.

 

Cursors are an easy way to describe a job, but the net result is that it ends in row by row reads & writes. For example a single row read use almost all the time suitable indexes, it they exists, which may seems good. But it is very not good if you finally read all the rows from the table this way. When you match a lot of rows using a join SQL Server may evaluate that using an index is not good, and the match is going to be done with a better suited joining algorithm for this type of job. There is another optimization gain possible there.

 

The only problem with this approach, is that you have to work harder to define the big “do it all” query.

Post #199432
Posted Monday, July 11, 2005 10:53 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Monday, March 09, 2009 1:35 PM
Points: 60, Visits: 12
This doesn't pivot the results. I wanted the the data from several matching rows in the Many table to come back as a single value of data separated by commas in the results table.

Suppose I had the following Many table...

ID DATA
-- ------
56 run
56 jump
56 scale

After the pivot the results data would look like this...

ID Result
-- ----------------
56 run, jump, scale



Post #199443
Posted Monday, July 11, 2005 4:42 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, August 18, 2005 4:21 PM
Points: 75, Visits: 1

After the pivot the results data would look like this...

ID Result
-- ----------------
56 run, jump, scale

No, not really, it should look like:

ID Result
-- ----------------
56 run,jump,scale,

That's 15 characters long, and if the return statement was:

return substring(@value,1,LEN(@value)-1)

you could drop the comma at the end.  That's assuming your string including the last ',' character was 20 or under characters long.

Post #199629
Posted Monday, July 11, 2005 5:07 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, August 18, 2005 4:21 PM
Points: 75, Visits: 1

I've had similar experiences with improving cursors.  I had to look at someone's report because they weren't getting good data from it anymore.  I ran the existing proc on my machine just to see what it would produce.  5 minutes later I hadn't gotten a result, and I had started to read what they were doing and understood why they weren't getting fast results, but still didn't understand what they were trying to do.  10 minutes after that I understood what they were trying to do, their proc is still running.  I open up a new window and start getting answers piecemeal.

If I execute this select I'll get all the users they are interested in.  Yep.  If I join this information I'll find out the offices they are involved in.  Yep.  Hmmm, how am I going to include all the related offices.  Oh, this should do it, but this could loop indefinitely, before I start this, put a different stop in the loop.  (Came in handy when I blew the data comparison later and caused the infinite loop.)

Anyway, by the end of the day, I had reproduced the report information (including the information the original proc was dropping.) that they wanted.  I left my workstation up overnight because I still hadn't gotten an answer from my machine in 6 hours of running.

Next morning I killed the 20 hour process, wrote up a new proc and got a report in 30 seconds.  I heard the old processes ran in 4 hours on the faster production machine for 1 DB and 2 hours for another DB.  (Custom processes for each DB.)  My proc gets reports from both DBs in 28 seconds.

Post #199632
Posted Tuesday, July 12, 2005 7:09 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Monday, March 09, 2009 1:35 PM
Points: 60, Visits: 12
You are correct for the code I listed in the article. I did change it in the final results here at work to remove the last comma.


Post #199823
Posted Tuesday, July 12, 2005 7:15 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Monday, March 09, 2009 1:35 PM
Points: 60, Visits: 12
All,

I just yesterday used the techique at a contract client with similar results although not quite as dramatic as the one I wrote about. To be fair though, there is a limit to how well this may work. After I wrote the article I came across another update that was being handled with a cursor. Within the cursor 18 columns had to be updated while pivoting data. I found I only needed 7 functions to perform the 18 updates but by the time all thoses calls were made for 2.2 milllion records, the new process clock out at the same speed as the old one. I tested individual calls and they were very short but that many had an accumulative affect. The good side though is that the code is so much simpler and easier to read that I plan to replace the old code anyway.

Have fun building your functions.



Post #199824
Posted Monday, July 25, 2005 12:14 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, May 15, 2013 1:12 AM
Points: 229, Visits: 164

Happend to see this while browsing . thought it might be relevant here. have a look

http://www.sqlteam.com/item.asp?ItemID=2368

Jeswanth



--------------------------------
Post #203824
Posted Friday, April 07, 2006 5:32 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, March 08, 2013 5:22 AM
Points: 119, Visits: 9
Just a couple of things I've found useful. If you have to loop generally it's quicker to use a while loop selecting each record individually rather than a cursor. Also if you find you have very large SQL statement it often runs much quicker broken up into smaller ones. If each calculation can be broken up into individual set based queries I'm sure you'll improve the performance.


Phil Nicholas
Post #271869
Posted Friday, April 07, 2006 8:21 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, November 18, 2009 6:44 AM
Points: 2, Visits: 4

I hope there are not many people hitting this table. This may be OK as long as it is a nightly run that you know you are the only code running against it.  

However long it runs, it will be locked during this process.  Although fast, for databases with a lot of people expecting to update it, it would be unacceptable unless you can get it to run in less than 4 seconds (you last said 26 seconds I believe) and lock timeout's are set well above that.  Speed is not always the only consideration unless you are running exclusively!  In which case I would put a table lock hint on it to speed it up even more.  I would recommend taking a little longer time to make sure you don't have a table lock for a half minute. 

Mark 

Post #271927
Posted Friday, April 07, 2006 11:01 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, November 18, 2009 6:44 AM
Points: 2, Visits: 4

Here is how a highly available database would look.  I work on databases with hundreds of thousands of users on databases unconceivable large to most.  At my job location, we don't even have a nightly processing time (global app). 

Whenever I read posts on this and most other sites, I have to keep in mind that they have a low concurrency of users.  As a user base grows, most will have a an increasing number of problems they can't identify over time.

The following code is focused on removing cursors, actually a lot of code should be added for error checking, but for simplicity and keeping to the topic it is left out.  While loops in our environment are a necessary evil to keep the number of rows being updated down to a small number so they can complete and release locks within 4 seconds (our apps rule; should actually be lower; perhaps 3).  Think availablity first, then speed.  At my work location, we even have variations of this that loop through rows in blocks of 5, 10, or whatever higher number still allows us to complete in less than 4 seconds.  As you can see, it is a completely different mind set.

--------------------

Declare @tmp_id int
Declare @prev_id int
Declare @tmp_values varchar(20)

Set LOCK_TIMEOUT = 12000  -- assuming all code in app is tested to have
                                        -- no update take longer than 4 seconds
                                         
Set @prev_id = -1  -- assumes all ID's are positive in nature

While Exists ( 
                   SELECT TOP 1 @tmp_id = id from DB.dbo.OutPut_tbl WITH(NOLOCK)
                    Where id > @prev_id     -- retrieve next row > than the previous
                    order by id                   -- key data ; preferable clustered ;
                                                      -- order by id to maintain 
                                                       -- a reference point
                )
BEGIN
      --Direct column pivot into a variable
      -- I don't know enough about the column_out field; but lets hope we never 
      -- increase in size greater than 20; declarations and table field would
      -- need to be changed if that 
      -- was possible.

      SELECT @tmp_values = @tmp_values + convert(varchar(20), column_data) + ','
      FROM DB.dbo.many_tbl WITH(NOLOCK)
      WHERE id = @tmp_id
  
      UPDATE DB.dbo.OutPut_tbl WITH(ROWLOCK) -- assuming that this will take
                                                                   -- less than 4 seconds
                                                                   -- otherwise create another
                                                                   -- while exists loop
      SET column_out = Left(@tmp_values,20)
      WHERE id = @tmp_id

      SET @prev_id = @tmp_id  -- move our pointer to the next row

END -- While Exists

Post #271988
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse