There Must Be 15 Ways To Lose Your Cursors... part 1, Introduction

  • [font="Verdana"]Agreed (both on SSIS and on the necessity for "chunking", although I call this batching (same concept). I wouldn't use a cursor for that though -- I would use a combination of TOP and a loop.[/font]

    AFAIK, you can't use top without a cursor to change the parameter if you are chunking. How you do the chunking of course depends on the situation. I find that CTEs are faster than TOP in chunking code.

    Thomas (4/14/2009)


    Another common use of cursors is to script a DDL change. For example, I want to write a script that will alter all columns in all tables call "Foo" to be nvarchar(10) instead of nvarchar(5). That is simply not possible using a set based solution.

    [font="Verdana"]Um... seriously? You commonly generate DDL scripts using cursors? I could see this as an infrequent thing, but not "common". And while a row-by-row approach might be necessary, that need not be cursor based. I think I will reserve judgement on this example until I've seen a more realistic (and common) case.[/font]

    I commonly use cursors to either build a script that I use later or to directly affect tables yes. Another example is if you need to add a column to a series of tables (like a modification date or timestamp). I suppose it would be more accurate to say that it is a common use for cursors for which there is no set-based solution. Whether it is common to development is admittedly arguable.

    [font="Verdana"]My issue with "avoid cursors if at all possible" is the same issue I have with "use cursors without evaluating the alternatives". Sometimes cursors areally are the best solution, although I'm finding those times fewer and fewer. I will consider using a cursor when I am doing something that is inherently sequential (procedural) and where data volumes are small. In that scenario, a cursor (or equivalent row-by-row method) is often the best approach.

    I prefer "use the best tool for the job", and learning that traditionally row-by-row approaches can be replaced with set-based approaches is part of selecting the best tool.

    [/font]

    Agreed. Here's another example. You want to build a numbers table that simply contains a series of sequential integers. There are many uses for such a table in solving problems such as finding gaps in a series of numbers. How do you populate it without using a cursor/loop? I'm sure we can piss away hours finding a solution however in minutes one could write the following and be done:

    Declare @i int

    Set @i = 0

    While @i < 10000

    Begin

    Insert Numbers(Value) Values(@i)

    Set @i = @i + 1

    End

    I'm all ears if there is a set-based solution (especially a SQL Standard solution) to populating said table. Assume you are starting with an empty database.

  • Wow, quite the discussion. Barry, I'm looking forward to more articles. The next one is slated for a few weeks out, but I might move it up.

    I would say the general advice in SQL Server is not to use cursors for production code until....

    The until is until you have a reason why it's required. This should be the last tool of choice for most people, and used when you have experience with the implications and capabilities. Too often people use this first when it's not needed.

    The other thing is that I'd argue that generating DDL scripts is administrative work, not production code. Something a user executes frequently (whatever that means for your system) is a place where you'd typically avoid cursors.

  • Thanks, Steve, this one might be close to the single day record. :hehe:

    FYI, folks: I wll be offline until tomorrow evening, so don't think that I am ignoring you... :-).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (4/14/2009)


    Christian Buettner (4/14/2009)


    RBarryYoung (4/14/2009)


    Christian Buettner (4/14/2009)


    I'll just jump in and risk making a fool of myself.

    In one of my data import apps (Oracle to SQL Server), I use temp tables with varchar(500) fields to ensure all data can be imported successfully to the local server (via DTS). The data from the temp tables is then merged into the final tables using T-SQL (delete / insert) statements.

    Since the source data can sometimes contain "invalid data" (e.g. primary key violations or field size too big), the import sometimes fails and this is where my cursor solution comes into play. It imports all rows from the temp table into the final table one by one. Each row that fails is imported into the corresponding error table. This allows correction or just validation of the invalid data lateron and ensures that correct data is available to all clients that were not affected. (Note: the first try is always set based - only in case of an error, the cursor based approach starts)

    Wondering if someone has a hint on a better way to do this.

    If you need more rationale for this, let me know.

    I use BULK INSERT, BCP or Integration Services(SSIS) for stuff like this. They have this capability built-in.

    RBarryYoung, can you please explain how you would route the rows among the error and final table using BCP or BULK INSERT? SSIS is not an option which I forgot to mention. Still using SQL Server 2000 & DTS.

    Check out the ERRORFILE argument of the BULK INSERT command, that should explain it.

    Sorry I still don't get it. The ERRORFILE argument routes the errors to a text file, not to a table. Unless I misunderstand the documentation.

    Best Regards,

    Chris Büttner

  • Bruce W Cassidy (4/14/2009)


    Christian Buettner


    In one of my data import apps (Oracle to SQL Server), I use temp tables with varchar(500) fields to ensure all data can be imported successfully to the local server (via DTS). The data from the temp tables is then merged into the final tables using T-SQL (delete / insert) statements.

    Since the source data can sometimes contain "invalid data" (e.g. primary key violations or field size too big), the import sometimes fails and this is where my cursor solution comes into play. It imports all rows from the temp table into the final table one by one. Each row that fails is imported into the corresponding error table. This allows correction or just validation of the invalid data lateron and ensures that correct data is available to all clients that were not affected. (Note: the first try is always set based - only in case of an error, the cursor based approach starts)

    Wondering if someone has a hint on a better way to do this.

    [...]

    Christian Buettner (4/14/2009)


    Can you please explain how you would route the rows among the error and final table using BCP or BULK INSERT? SSIS is not an option which I forgot to mention. Still using SQL Server 2000 & DTS.

    [font="Verdana"]Christian,

    We do similar stuff here on a variety of platforms, one of which is SQL Server 2000. Here's the approach I would recommend.

    1. truncate (or create) your loading table and your error table. These tables should be able to hold all of the data without generating errors, but ithey needs to split the data into the right fields. We generally add a line number field (using an identity) and a load date field (with a default of getdate()).

    2. Use BCP or bulk insert to load the raw data into a load table. Sadly you can't use the XML file format option in SQL Server 2000 (which I find much easier to work with.)

    3. Select out the data that is problematic and insert it into an "error table" (probably the same format as the load table but without the identity). We found we are doing this step with dynamic SQL to build up the list of rows that are errors, based on a set of stored rules that get concatenated.

    4. Insert the remaining data (so you can left join to the error data by the line number and ensure that the right hand side is null) into the destination table.

    No cursors needed. 😀 For steps 3 and 4, you can also put loops around the statements and use TOP to keep the batch sizes down to reasonable sizes (so you don't blow out the transaction logs).

    If you want some additional help on this, just create a new thread and I'm sure a bunch of people will be able to pitch in with specific examples.

    [/font]

    Hi Bruce,

    thanks for your suggestion. Unfortunately it seems to be more complex than necessary.

    To filter the "problematic" data, I would need to detect it first, which requires specific coding for each and every table I import from the source.

    I am keeping it simple here by inserting everything that fits (that is expected for all data usually) into the final table and anything that doesn't fit into the error table. There is no real data cleansing done on my side(theoretically it should already be cleansed in the source).

    The error handling cursor is only meant to catch small issues, not to do extensive data cleansing.

    Besides that, the cursor is only called if the initial set based import failed, therefore the additional impact is kept to a minimum (actually almost non-existent) during regular data loads.

    Best Regards,

    Chris Büttner

  • Hi Thomas,

    Your point is well taken. For the specific example you have mentioned, you could use:

    sp_MSforeachtable

    @command1 = "Alter Table ? Alter Column Foo nvarchar(10)",

    @whereand = "and OBJECT_ID in (select object_ID from sys.columns where name = 'Foo')"

  • Thomas (4/14/2009)


    This type of discussion reminds me of heated arguments I have had with Mr Celko about SQL standards......

    A simple example of where set based operations break down is in very large data sets. I want to take a table of 10 million rows and copy it to another table in a production system. The obvious solution would seem to be something like:

    Insert DestTable

    Select ...

    From TableA

    Yet, at the number of rows we are discussing, this could take hours to execute even on the best hardware ....

    Another common use of cursors is to script a DDL change. For example, I want to write a script that will alter all columns in all tables call "Foo" to be nvarchar(10) instead of nvarchar(5). That is simply not possible using a set based solution......

    Hi Thomas,

    For moving 10 million rows, I would use BCP not SQL.

    For the second example, you could use:

    sp_MSforeachtable

    @command1 = "Alter Table ? Alter Column Foo nvarchar(10)",

    @whereand = "and OBJECT_ID in (select object_ID from sys.columns where name = 'Foo')"

  • It looks the intend is only to find the count result of cross joining a table with itself.

    For this, It may not be mandatory to make the join also.

    Declare @Counter INT

    SELECT @Counter = COUNT(*) from master.sys.columns

    SELECT @Counter * @Counter

    (Or)

    SELECT COUNT(*) * COUNT(*) from master.sys.columns

    Warmly welcome any comments on this.

  • It looks the intend is only to find the count result of cross joining a table with itself.

    For this, It may not be mandatory to make the join also.

    Declare @Counter INT

    SELECT @Counter = COUNT(*) from master.sys.columns

    SELECT @Counter * @Counter

    (Or)

    SELECT COUNT(*) * COUNT(*) from master.sys.columns

    Also, suggest which of the above two is the best way to perform this. I feel the second one, since it uses single statement. :w00t:

    Warmly welcome any comments on this. 😛

  • gautamsheth2000 (4/13/2009)


    Should use this code 🙂

    Select count(*) From master.sys.columns C1 CROSS JOIN master.sys.columns C2

    You can do even one better.

    using the PK in count is faster than *

    ie,

    Select count(id) From master.sys.columns C1 CROSS JOIN master.sys.columns C2[/quote]

  • Thomas (4/14/2009)


    Agreed. Here's another example. You want to build a numbers table that simply contains a series of sequential integers. There are many uses for such a table in solving problems such as finding gaps in a series of numbers. How do you populate it without using a cursor/loop? I'm sure we can piss away hours finding a solution however in minutes one could write the following and be done:

    Declare @i int

    Set @i = 0

    While @i < 10000

    Begin

    Insert Numbers(Value) Values(@i)

    Set @i = @i + 1

    End

    I'm all ears if there is a set-based solution (especially a SQL Standard solution) to populating said table. Assume you are starting with an empty database.

    Hi Thomas,

    If I understand this correctly, then the method suggested by Jeff Moden would be as follows:

    --=============================================================================

    -- Create and populate a Tally table

    --=============================================================================

    --===== Conditionally drop

    IF OBJECT_ID('dbo.Tally') IS NOT NULL

    DROP TABLE dbo.Tally

    --===== Create and populate the Tally table on the fly

    SELECT TOP 10000

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N)

    WITH FILLFACTOR = 100

    --===== Let the public use it

    GRANT SELECT, REFERENCES ON dbo.Tally TO PUBLIC

    You are right about the time spent finding this type of solution, but that one is a perfect example of how even the most simple of operations performed by a cursor/while loop can be destroyed using set-based solutions.

    It is also not just a runtime issue. Many cursor solutions hammer resources in general.

    Think of the savings that can be made by only touching your data once to perform the work you want. You no longer need to invest $$$ in newer/faster storage, because the I/O bottleneck is no longer there.

    These things don't always have an immediate effect, but can save you headaches in the future. Don't go the M$ route and throw more hardware at a software problem!

    Regards

    GermanDBA

    Regards,

    WilliamD

  • I will consider using a cursor when I am doing something that is inherently sequential (procedural) and where data volumes are small. In that scenario, a cursor (or equivalent row-by-row method) is often the best approach.

    That's exactly my point. In my case, my problem fits this exactly. My proc runs very slowly at times, but that's a function of locking issues with a library stored proc (which I don't own) that I'm required to use to do a calculation prior to setting up and running my cursors.

    When using cursors, I do my best to reduce the number of rows that must be traversed and, so far, all my cursors have been forward read-only.

    Of course, my personal inclination (which goes against our policy) would be to pull my data into a C# dataset and perform the business logic there before writing the results back out to SQL. Does that qualify as being set-based? 🙂

    Peter Hansen

  • Christian Buettner (4/15/2009)


    Sorry I still don't get it. The ERRORFILE argument routes the errors to a text file, not to a table. Unless I misunderstand the documentation.

    Sorry, Christian, I missed that you wanted an ErrorRowsToTable solution instead of ErrorRowsToFile.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Agreed. Here's another example. You want to build a numbers table that simply contains a series of sequential integers. There are many uses for such a table in solving problems such as finding gaps in a series of numbers. How do you populate it without using a cursor/loop? I'm sure we can piss away hours finding a solution however in minutes one could write the following and be done:

    Declare @i int

    Set @i = 0

    While @i < 10000

    Begin

    Insert Numbers(Value) Values(@i)

    Set @i = @i + 1

    End

    I'm all ears if there is a set-based solution (especially a SQL Standard solution) to populating said table. Assume you are starting with an empty database.

    Couple of very simple solutions.

    First one is SQL 2005/2008 only, won't work in 2000:

    create table dbo.Numbers (Number int primary key);

    go

    insert into dbo.Numbers (Number)

    select top 10000 row_number() over (order by t1.object_id)

    from master.sys.columns t1

    cross join master.sys.columns t2;

    If you don't want to use master.sys.columns, you can use any other table that already has rows in it.

    Another 2k5 version, if you don't want to select from system tables/views/etc.

    create table dbo.Numbers (Number int primary key);

    go

    ;with

    CTE (Number) as

    (select 1 union 2 union 3 union 4 union 5 union 6 union 7 union 8 union 9 union 10)

    CTE2 (Number) as

    (select c1.Number

    from CTE c1

    cross join CTE c2)

    CTE3 (Number) as

    (select c3.Number

    from CTE2 c3

    cross join CTE2 c4)

    insert into dbo.Numbers (Number)

    select top 10000 row_number() over (order by Number)

    from CTE3;

    It's not as efficient as the first one, but it does work.

    For an SQL 2000 version, try this:

    create table dbo.Numbers (Number int identity primary key, PlaceHolder bit);

    go

    insert into dbo.Numbers (PlaceHolder)

    select top 10000 null

    from master.dbo.syscolumns

    cross join master.dbo.syscolumns;

    go

    alter table dbo.Numbers

    drop column PlaceHolder;

    All of these are faster than cursors/loops.

    In 2000, 2005 or 2008, this will also work, but it's just a loop that doesn't use While or a cursor.

    create table dbo.Numbers (Number int identity primary key)

    go

    set nocount on;

    go

    insert into dbo.Numbers

    default values;

    go 10000

    That'll run the insert 10k times. It's a little faster than a cursor, but nowhere near as fast as the above solutions. You will see it used sometimes, and it's an almost unknown use of the batch separator to make it run the batch multiple times by putting a number after it.

    Edit: Posting these because Jeff's solution isn't the only one, and some people find it a little hard to follow.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Girish Bhat (4/15/2009)


    Hi Thomas,

    Your point is well taken. For the specific example you have mentioned, you could use:

    sp_MSforeachtable

    @command1 = "Alter Table ? Alter Column Foo nvarchar(10)",

    @whereand = "and OBJECT_ID in (select object_ID from sys.columns where name = 'Foo')"

    But then we're back to a loop/cursor based approach even if it means using the built-in stored procedure to do.

Viewing 15 posts - 136 through 150 (of 380 total)

You must be logged in to reply to this topic. Login to reply