Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

CURSOR speed - SQL 2000 vs. SQL 2005 Expand / Collapse
Author
Message
Posted Monday, December 10, 2007 2:40 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 29, 2011 8:08 AM
Points: 168, Visits: 103
Currently I am converting 10 years worth of customer and order data from an old FileMaker database into MS SQL 2000 and the best way we have found to do this is with cursors using T-SQL. I have it all working fine now, but had a few bumps along the way.

With SQL 2000 it seems that there is major memory paging going on when a large table is being read from within a cursor. I have done quite a bit of research on this in the last couple weeks, but didn't find out too much as everyone seems to hate cursors LOL

Anyways, I imported a 1,000,000+ line csv file into a temp table in SQL 2000. From this I have built a cursor to read it one row at a time, do one select query to find a product id number from the new product table, insert the data into the new table, then mark it as imported. Pretty basic, but this cursor was taking days to run!

I did some testing and used a smaller 100,000 row table to run the cursor from, and there was a huge performance gain. Where I was processing around 200 rows per minute on my older and overworked SQL server, I was suddenly getting 4000+ rows a minute!

So I do some more testing. I tried to make smaller cursors with a WHILE loop to only grab 10000 rows at once from my 1,000,000 table but that only got me up to 400 rows a minute. Then I decided to expand the while loop to read 10k rows, create a new table and populate it with them, then create the cursor with the temp table, and drop it when done. Rinse and repeat. This got me up to 5000+ rows a minute which I am quite happy with given my server.

Now I ask you all, does SQL 2005 have these same types of issues with cursors on large tables?

I just didn't find any info like this on any site in the last 3 weeks of searching, so I thought it would be good to throw this out as it might help others speed up their cursors.

Perhaps this way of breaking down a cursor will help someone...
Post #431542
Posted Monday, December 10, 2007 2:57 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Wednesday, November 12, 2014 6:29 PM
Points: 3,840, Visits: 3,852
I think that you'll find the same problems in SQL Server 2005 as well. SQL Server is designed to work off sets of data. Row by row processing is terribly inefficient, slow, and resource intensive. For the example you gave, why do you feel you need to insert the rows one at a time? If you've already imported from csv into a table, why can't you do your INSERT as a set based transaction?



John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
Post #431548
Posted Monday, December 10, 2007 3:13 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 29, 2011 8:08 AM
Points: 168, Visits: 103
For all of the cursors I have written for this data conversion, I need to do one or more seperate selects or inserts besides the row I'm working with. Sometimes I need to do some math with the data read before it is inserted, sometimes I need to break an address or phone number out to be inserted into a different table.

The above example is the simplest cursor I have done so far, it is for order line items and does not write to more than one table. So yea, this one might be able to be done another way.

I just have not found another way to do this other than a cursor.

I'm not quite sure what you mean by "INSERT as a set based transaction"

I'm still kinda new to MS SQL 2000. I have 6 years expereince with MySQL and PHP with smaller databases on e-commerce projects, but this large data conversion stuff is new to me... Still got a lot to learn.
Post #431554
Posted Monday, December 10, 2007 3:30 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Wednesday, November 12, 2014 6:29 PM
Points: 3,840, Visits: 3,852
Set based means that you are working on 'sets' of data instead of rows. Anytime you can apply logic or business rules accross a set of common data, you'll way outperform cursoring through each row and working one at a time. Why don't you give us some greater detail as to your setup and schema and then post your cursors. If you provide enough background and setup info, you'll have several experts jump in and help you. What you are trying to do sounds like a pretty basic ETL (Extract-Transform-Load) operation and I've yet to find one that cannot be solved without the use of a Cursor.

So for your case, you are using a cursor because you have to do different actions depending on how the data looks right? This tells me that you have some sort of business rules that tell you what type of activity you need to perform on that data. Instead of cursoring through each row and deciding what to do with the data based on those rules, why not run one operation for each grouping or set of rows? I may not be comming accross real clear but take this example. Say you have a table that looks like this:

MyTable
RoNum RowValue
1 A
2 B
3 C
4 B
5 B
6 A

Let's say that the RowValues define which set of business rules you are going to use to load this data. The cursor approach says to process all rows one at a time starting with row 1. As soon as the cursor pulls RoNum 1, it says 'Ah ha, this is a type A. I'll run the business logic for this type of row'. It will process RoNum 1 and then move on to RoNum 2 to determine what to do there. This process continues until there are no more rows. Pretty simple right? Well, with set based logic, you would apply your busness logic to each set of data. In this case, your code would process RoNum 1 and 6 at the same time, rows 2, 4, and 5 and the same time, and then row 3.

Post your cursors and you'll get some help. If you really are open minded to learn, having a cursor re-written as set based for you will get you going down the right path.




John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
Post #431557
Posted Monday, December 10, 2007 4:29 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 29, 2011 8:08 AM
Points: 168, Visits: 103
OK, here is the cursor I am using for the above example. This is the simplest cursor I have used so far so it is probably the easiest to work with.

The 'posted_order_items_from_filemaker2' table is 1,027,000 rows long and has all of the data from the old database. The 'posted_order_items_from_filemaker ' table is being used as a temp table for each cursor run through. I am marking each row as 'done' in the 'posted_order_items_from_filemaker2' table in case the cursor needs to be stopped (or stops on its own) as I had the SQL Query Analyzer freeze up when I was importing the main order table and breaking it down with a much more complex cursor. The 'imported' field is '0' when not imported, then changed to '1' after the new row is written in the new table 'OrderItems'.

By breaking this cursor down into smaller ones, it took 2:51 hours to finish. When it was one large cursor it took like 3 days! I consider this a big improvement, but if there is a better way then I'm all ears! Or is that eyes? hmmm


DECLARE @comments varchar(255)
DECLARE @customer_key varchar(255)
DECLARE @discount varchar(255)
DECLARE @item_code varchar(255)
DECLARE @price_per_item money
DECLARE @order_key varchar(255)
DECLARE @quantity_received varchar(255)
DECLARE @quantity_to_ship varchar(255)
DECLARE @cost money
DECLARE @price_per_item_after_discount money
DECLARE @line_id int
DECLARE @customer_id int
DECLARE @product_id int
DECLARE @quantity int
DECLARE @start_num int
DECLARE @end_num int

DECLARE @ImportCursor CURSOR

SET @start_num = 1
SET @end_num = 10000

WHILE @start_num < 1030000

BEGIN

SELECT * INTO posted_order_items_from_filemaker FROM posted_order_items_from_filemaker2
WHERE imported = '0'
AND id >= @start_num
AND id <= @end_num

SET @ImportCursor = CURSOR FAST_FORWARD
FOR
Select comments, customer_key, discount, item_code, price_per_item, order_key, quantity_received,
quantity_to_ship, cost, price_per_item_after_discount, id
From posted_order_items_from_filemaker

OPEN @ImportCursor
FETCH NEXT FROM @ImportCursor
INTO @comments, @customer_key, @discount, @item_code, @price_per_item, @order_key,
@quantity_received, @quantity_to_ship, @cost, @price_per_item_after_discount, @line_id
WHILE @@FETCH_STATUS = 0
BEGIN

SET @product_id = ''

SELECT @product_id = id
FROM Products
WHERE ProductCode = @item_code

SET @quantity = @quantity_received + @quantity_to_ship

SET NOCOUNT ON
INSERT INTO OrderItems (
OrderID,
ItemID,
Qty,
QtyDelivered,
QtyToSend,
item_price,
item_cost,
comments,
filemaker_discount,
filemaker_price_per_item_after_discount,
import_item,
filemaker_id
) VALUES (
@order_key,
@product_id,
@quantity,
@quantity_received,
@quantity_to_ship,
@price_per_item,
@cost,
@comments,
@discount,
@price_per_item_after_discount,
'1',
@line_id
)

UPDATE posted_order_items_from_filemaker2
SET imported = '1'
WHERE id = @line_id

FETCH NEXT FROM @ImportCursor
INTO @comments, @customer_key, @discount, @item_code, @price_per_item, @order_key,
@quantity_received, @quantity_to_ship, @cost, @price_per_item_after_discount, @line_idEND

CLOSE @ImportCursor
DEALLOCATE @ImportCursor

SET @start_num = @start_num + 10000
SET @end_num = @end_num + 10000

DROP TABLE posted_order_items_from_filemaker

END


Post #431578
Posted Monday, December 10, 2007 4:59 PM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Monday, November 3, 2014 4:30 PM
Points: 4,574, Visits: 8,366
This will do everything:
INSERT INTO OrderItems (
OrderID,
ItemID,
Qty,
QtyDelivered,
QtyToSend,
item_price,
item_cost,
comments,
filemaker_discount,
filemaker_price_per_item_after_discount,
import_item,
filemaker_id
)
SELECT
F.order_key,
P.id,
F.quantity_received + F.quantity_to_ship,
F.quantity_received,
F.quantity_to_ship,
F.price_per_item
F.cost,
comments,
discount,
price_per_item_after_discount,
'1',
id
From dbo.posted_order_items_from_filemaker2 F
INNER JOIN dbo.Products P ON P.ProductCode = F.item_code


Post #431586
Posted Monday, December 10, 2007 5:32 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 29, 2011 8:08 AM
Points: 168, Visits: 103
Thanks! That definitly sheds some light on how to avoid a cursor.

I tried to do an INNER JOIN in my cursor but unfortunalty some of the input data rows have blank (null) item_code or one that is in error (not in the Products table) so those ended up getting skipped completely. That is why I broke that part out into a seperate select statement so as to allow it to be blank and still get inserted into the new table. It's sloppy, but I need to get all of the incomplete rows inputed still to convert all the old orders into the new system.

I have a lot of 'bad' data that has to be allowed for

Also, on my more complex cursors I need to check to see if an address is already in the address table, if it is then I need to pull the address_id, otherwise I need to create a new address then use it's address_id. How would I do something like this? I have several other more complex cursors which are all doing selects then using IF statements to determine the next step. i.e. INSERT or SELECT or maybe even an UPDATE.
Post #431591
Posted Monday, December 10, 2007 5:46 PM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Monday, November 3, 2014 4:30 PM
Points: 4,574, Visits: 8,366

I have a lot of 'bad' data that has to be allowed for

LEFT JOIN to dbo.Product will do just that.

But can you explain what that "bad" data will mean?
Bad data should not be allowed in relational database.


Also, on my more complex cursors I need to check to see if an address is already in the address table, if it is then I need to pull the address_id, otherwise I need to create a new address then use it's address_id. How would I do something like this? I have several other more complex cursors which are all doing selects then using IF statements to determine the next step. i.e. INSERT or SELECT or maybe even an UPDATE.


INSERT INTO dbo.Address
SELECT ....
WHERE NOT EXISTS (select 1 from dbo.Address
where {check unique key existence})

Then just join dbo.Address to final insert
Post #431594
Posted Monday, December 10, 2007 6:01 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 29, 2011 8:08 AM
Points: 168, Visits: 103
By 'bad data' I mean data that is not in the correct format. For instance, in the above line item import, I ran across 3 rows which had '.5' for the quantity. As this is a float value in SQL and not an integer it made the cursor stop each time. I altered the original data to '1' and restarted the cursor. I am running into a lot of data that is blank/null when it should not be. Or has invalid values.

The data was pulled from a FileMaker Pro 4.0 database which seems to have extreemly simplistic data types. A 'Number' field can be a Integer or a Float for example. A Text field can be a varchar of any length. FileMaker does not make you specify the data lengths anywhere it seems, so I'm having to find out through trial and error where I need to tweak the lengths of my varchars.

If I'm processing this garbage data one row at a time and marking each row as 'done', when it stops I can find where it stopped fast and look at the data on that row to see what is happening.

I'll try implementing the suggestions above to see if I can allow for all the variables I'm dealing with. Thanks for the input! :)
Post #431595
Posted Monday, December 10, 2007 6:16 PM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Monday, November 3, 2014 4:30 PM
Points: 4,574, Visits: 8,366
Steve Stout (12/10/2007)
By 'bad data' I mean data that is not in the correct format. For instance, in the above line item import, I ran across 3 rows which had '.5' for the quantity. As this is a float value in SQL and not an integer it made the cursor stop each time. I altered the original data to '1' and restarted the cursor. I am running into a lot of data that is blank/null when it should not be. Or has invalid values.

The data was pulled from a FileMaker Pro 4.0 database which seems to have extreemly simplistic data types. A 'Number' field can be a Integer or a Float for example. A Text field can be a varchar of any length. FileMaker does not make you specify the data lengths anywhere it seems, so I'm having to find out through trial and error where I need to tweak the lengths of my varchars.

If I'm processing this garbage data one row at a time and marking each row as 'done', when it stops I can find where it stopped fast and look at the data on that row to see what is happening.

I'll try implementing the suggestions above to see if I can allow for all the variables I'm dealing with. Thanks for the input! :)


You have your data types, you have your business rules regarding the imported data.
Right?

So, just put the checks for those rules in WHERE clause:

WHERE Quantity NOT LIKE '%[^0-9]%' -- this will leave only rows with integer quantity
AND EXISTS (select 1 from Products P ON P.ProductCode = F.item_code)
AND {whatever else you need to check}


If you want to look at rejected lines store it in "Failed_FileMaker" table:

INSERT INTO dbo.Failed_FileMaker (...)
SELECT ...
WHERE Quantity LIKE '%[^0-9]%' OR Quantity = '' OR Quantity IS NULL
OR NOT EXISTS (select 1 from Products P ON P.ProductCode = F.item_code)
OR .... {whatever else you need to check}

Post #431596
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse