There Must Be 15 Ways To Lose Your Cursors... part 1, Introduction

  • Thomas-282729 (6/25/2010)


    Michael Jenck (6/25/2010)


    I have been trying to figure out a way to eliminate the following cursor. I figure this would be a good place to post this seeing the title of this article :-).

    The only conditions are that I don't know in advance what the child and parent records are - just that at least 1 of each exists.

    (Note: I'm not a DBA. I fall in the category of knowing just enough SQL to be dangerous.)

    <snip>

    When you hit a problem in T-SQL that requires a cumbersome solution, you should sit back and consider whether it ought to be done in T-SQL in the first place. The SQL language is fundamentally not designed for dynamic column generation. Further, T-SQL was not designed for fancy string manipulation. So, the solution to eliminating a cursor in this case is to not do dynamic crosstabs in T-SQL. Build your query in a middle-tier component or reporting tool. Is it possible to do it in T-SQL? Yes, just like it is possible to use Excel as a database or to write memos. That does not mean you should use it for this purpose just as you shouldn't use T-SQL for dynamic crosstabs.

    While I agree that creating dynamic crosstabs on properly preaggregated data using something like SSRS is usually a much better way to go, there are cases where you need be able to do a dynamic crosstab in T-SQL (usually bad DB design or "special" requirements). Although T-SQL isn't usually the place to do such a thing, you'd better know how to do it so you don't have to tell your boss that you can't do it without getting another chunk of software involved. T-SQL does have more than just data storage capabilities in the form of DML (Data Manipulation Language) and it's frequently much more effective than relegating such tasks to the "outside" world.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thomas-282729 (6/25/2010)


    There is an answer, but you probably won't like it: don't do it in a stored procedure.

    If you cannot accomplish a given piece of work in a set-based format, it was probably not meant to be done in a database. Stored procedures were not meant to encapsulate complex business logic of the type you suggest. Databases are first and foremost meant to serve up data. Further, if there is so much data that streaming it to an external service is impractical, then I would suggest that it is also impractical to do it row-by-row in a stored proc. If you can do it row-by-row in a proc, then you can do it row-by-row in an external service.

    Thanks for the response. I agree, in general, but until you've worked with SmartClients, remote connectors, WFC, Web Services, and all that other supposedly "good" stuff Microsoft is spewing out on the dev community, you've not had to have the following argument with developers:-

    Dev: "We need to put this logic in the Database"

    DBA: "That's business logic, and should live in the service BL Layer"

    Dev: "We must pull out all billing and banking records for each ident. Each connection takes around one tenth of a second, and therefore can take around three seconds per record to process. To process the whole database will take around sixty-four days".

    DBA: "Is there no faster way to do this?"

    Dev: "Possibly, but we have a deadline, and it's almost the end of the <insert financial deadline here>".

    DBA: "So what's your solution?"

    Dev: "I will pass in an XML file with a full list of all the record IDs which need to be processed, and the database will expand it and process each of the rows, and return a total."

    DBA: "No, that's silly, and will require a Cursor, which is *bad*"

    Dev: "You'll have to explain to the customer why they won't be able to collect their <insert financial term>, they will lose about eight million pounds."

    .

    .

    rinse repeat.

    There are many occasions where the slide into poor practice is unavoidable or just plain impossible by export/import. A cursor achieves the above two-month operation in around an hour (still too slow for the customers, but hey, it's better than 64-days).

    Anyway, this is a very specific issue, and I'm interested to see how I can get rid of cursors effectively for such things....

  • Your description of the child Cursor describes my son to a tee.

    Bastard love-child. Hmmm. :blush:

  • We must be in to summer rerun season. 😎

    Maybe the unknown procedural language was COBOL or some derivative?

  • I agree with Brad Neufeld this article bites. Another condescending rant.

  • Nick Walton (6/25/2010)


    Thomas-282729 (6/25/2010)


    There is an answer, but you probably won't like it: don't do it in a stored procedure.

    If you cannot accomplish a given piece of work in a set-based format, it was probably not meant to be done in a database. Stored procedures were not meant to encapsulate complex business logic of the type you suggest. Databases are first and foremost meant to serve up data. Further, if there is so much data that streaming it to an external service is impractical, then I would suggest that it is also impractical to do it row-by-row in a stored proc. If you can do it row-by-row in a proc, then you can do it row-by-row in an external service.

    Thanks for the response. I agree, in general, but until you've worked with SmartClients, remote connectors, WFC, Web Services, and all that other supposedly "good" stuff Microsoft is spewing out on the dev community, you've not had to have the following argument with developers:-

    Dev: "We need to put this logic in the Database"

    DBA: "That's business logic, and should live in the service BL Layer"

    Dev: "We must pull out all billing and banking records for each ident. Each connection takes around one tenth of a second, and therefore can take around three seconds per record to process. To process the whole database will take around sixty-four days".

    DBA: "Is there no faster way to do this?"

    Dev: "Possibly, but we have a deadline, and it's almost the end of the <insert financial deadline here>".

    DBA: "So what's your solution?"

    Dev: "I will pass in an XML file with a full list of all the record IDs which need to be processed, and the database will expand it and process each of the rows, and return a total."

    DBA: "No, that's silly, and will require a Cursor, which is *bad*"

    Dev: "You'll have to explain to the customer why they won't be able to collect their <insert financial term>, they will lose about eight million pounds."

    .

    .

    rinse repeat.

    There are many occasions where the slide into poor practice is unavoidable or just plain impossible by export/import. A cursor achieves the above two-month operation in around an hour (still too slow for the customers, but hey, it's better than 64-days).

    Anyway, this is a very specific issue, and I'm interested to see how I can get rid of cursors effectively for such things....

    DBA: A. Why is it my problem that someone else did not do a better job of planning ahead? B. why are you pulling one record at a time instead of pulling a batch of records? C. Why not write a small app that batches and processes the data? That will be significantly faster to develop and will run faster than trying to hack something together in T-SQL and you could even make it multi-threaded so that will perform the batch operations in parallel. Bonus points for making it so that you can monitor the progress of the operations.

    ...

    Cursors aren't "evil" per se but they are often the wrong tool for the job. To its credit, T-SQL can solve many problems for which it was not intended but that creates its own issues. The problem here is that it is possible to do it in T-SQL even though T-SQL is the wrong solution. I think I've had more problems from developers hacking things together because it was "possible" than just about anything else. It is akin to building a database system using email because it is "possible". Lots of dynamic SQL or complex logic encapsulated in stored procedures are examples of misuse of T-SQL IMO.

  • Jeff Moden (6/25/2010)


    Thomas-282729 (6/25/2010)


    Michael Jenck (6/25/2010)


    I have been trying to figure out a way to eliminate the following cursor. I figure this would be a good place to post this seeing the title of this article :-).

    The only conditions are that I don't know in advance what the child and parent records are - just that at least 1 of each exists.

    (Note: I'm not a DBA. I fall in the category of knowing just enough SQL to be dangerous.)

    <snip>

    When you hit a problem in T-SQL that requires a cumbersome solution, you should sit back and consider whether it ought to be done in T-SQL in the first place. The SQL language is fundamentally not designed for dynamic column generation. Further, T-SQL was not designed for fancy string manipulation. So, the solution to eliminating a cursor in this case is to not do dynamic crosstabs in T-SQL. Build your query in a middle-tier component or reporting tool. Is it possible to do it in T-SQL? Yes, just like it is possible to use Excel as a database or to write memos. That does not mean you should use it for this purpose just as you shouldn't use T-SQL for dynamic crosstabs.

    While I agree that creating dynamic crosstabs on properly preaggregated data using something like SSRS is usually a much better way to go, there are cases where you need be able to do a dynamic crosstab in T-SQL (usually bad DB design or "special" requirements). Although T-SQL isn't usually the place to do such a thing, you'd better know how to do it so you don't have to tell your boss that you can't do it without getting another chunk of software involved. T-SQL does have more than just data storage capabilities in the form of DML (Data Manipulation Language) and it's frequently much more effective than relegating such tasks to the "outside" world.

    Jeff,

    Thanks for pointing out the fact that there was a part 2 to the article. After reading and searching I came across a post that used

    COALESCE

    Which is what I needed to use. My working example is now below.

    So my non cursor version is

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Temp]') AND type in (N'U'))

    DROP TABLE [dbo].[Temp]

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Temp2]') AND type in (N'U'))

    DROP TABLE [dbo].[Temp2]

    GO

    /****** Object: Table [dbo].[Temp] Script Date: 10/28/2009 08:20:47 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[Temp](

    [ObjectId] [int] NOT NULL,

    [ObjectName] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL

    ) ON [PRIMARY]

    INSERT INTO [dbo].[Temp]

    ([ObjectId], [ObjectName])

    SELECT 1, 'Parent 1' UNION ALL

    SELECT 2, 'Parent 2' UNION ALL

    SELECT 3, 'Parent 3' UNION ALL

    SELECT 4, 'Parent 4'

    CREATE TABLE [dbo].[Temp2](

    [ObjectId] [int] NOT NULL,

    [ObjectName] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [ParentId] [int] NOT NULL,

    [ChildId] [int] NOT NULL

    ) ON [PRIMARY]

    INSERT INTO [dbo].[Temp2]

    ([ObjectId], [ObjectName], [ParentId], [ChildId])

    SELECT 1, 'Option 1',1, 10 UNION ALL

    SELECT 1, 'Option 1',2, 20 UNION ALL

    SELECT 1, 'Option 1',3, 30 UNION ALL

    SELECT 1, 'Option 1',4, 40 UNION ALL

    SELECT 1, 'Option 2',1, 11 UNION ALL

    SELECT 1, 'Option 2',2, 21 UNION ALL

    SELECT 1, 'Option 2',3, 31 UNION ALL

    SELECT 1, 'Option 2',4, 41

    DECLARE @NewColumnName varchar(8000)

    DECLARE @NewColumnValue varchar(8000)

    DECLARE @NewSQL varchar(8000)

    SELECT @NewColumnName = COALESCE(@NewColumnName + ', ', '') + '[' + CAST([ObjectId] AS varchar(5))+ '] AS [' + ObjectName + ']',

    @NewColumnValue = COALESCE(@NewColumnValue + ', ', '') + '[' + CAST([ObjectId] AS varchar(5))+ ']'

    FROM [dbo].[Temp]

    SET @NewSQL = 'SELECT ObjectId, ObjectName, ' + @NewColumnName +

    ' FROM (SELECT ObjectId, ObjectName, ParentId, ChildId FROM dbo.Temp2 WHERE (ObjectId = 1)) AS src

    PIVOT (Min(ChildId) FOR ParentId IN (' + @NewColumnValue + ')) AS pvt ORDER BY ObjectName'

    EXEC (@NewSQL)

  • Wow! I can't believe some of the readers took this article so personally.

    In my case you are preaching to the choir, and never the less produced a highly entertaining piece. I look forward to following the series.

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • Hi my name is Jeff.

    I am new to this forum. About a year ago I started at a company and last year they upgraded to SQL 2005 and nearly every procedure uses a cursor, in some of the most bizarre circumstances.:sick: I have been trying to rewrite them as I have had to modifiy them, but some of them are so embedded in other processes or so long and confusing with cursors inside other cursors :sick: that I hesitate to do so.:unsure:

    But I have come across a scenario that i am at a loss on how to avoid a loop. And maybe this is an exception to the rule about using loops.

    Any help or advice would be appreciated.

    The purpose of the procedure is to back into the SDI tax rate when the total SDI Tax has exceeded it's limit, so the new net Tax value can be recalculated. It usally has to loop through the process 5-10 times to return.

    See everytime the SDI tax rate changes the Total net tax changes. But SDI tax i based on the Net Total Tax.

    So everytime you adjust the tax rate so that the SDI TAX is at the limit, then the NetTotal Tax changes and the SDI rate is not accurate, But every time you make the adjustment you get closer to the SDI tax.

    I realize that this is circular logic, but it does retrun the accurate data because the SDI rate keeps getting closer and closer to the correct rate needed to set the SDI tax at its limit.

    Here is a function that I use a loop for.

    CREATE FUNCTION dbo.fn_TaxLimitCalcTable

    (

    @startVal decimal(25,18),

    @LimitVal decimal(25,18),

    @YTDVal decimal(25,18),

    @NetBenefit decimal(25,18),

    @Fed decimal(25,18),

    @staterate decimal(25,18),

    @Soc decimal(25,18),

    @med decimal(25,18),

    @CurRate decimal(25,18)

    )

    RETURNS @TaxLimit TABLE

    (

    NewSDIRate decimal(25,18),

    adNetval decimal(25,18),

    adjustedval decimal(25,18),

    Fed decimal(25,18),

    Med decimal(25,18),

    staterate decimal(25,18),

    Soc decimal(25,18)

    )

    AS

    /****************************************************

    SP Name: fn_TaxLimitCalcTable

    Description: This function re-calculates the associated tax and rates when

    the maximum tax limit has been reached or exceeded.

    ****************************************************/

    BEGIN

    DECLARE @w2amt1 decimal(25,18), @NewSDIRate decimal(25,18),

    @adjustedval decimal(25,18), @adNetval decimal(25,18),@Returned decimal(25,18),

    @Fed1 decimal(25,18),@Med1 decimal(25,18), @staterate1 decimal(25,18),@Soc1 decimal(25,18)

    SELECT @NewSDIRate = @CurRate,

    @adjustedval=@startVal,

    @adNetval=@NetBenefit*(1/(1-(@CurRate+@Fed+@staterate+@Soc+@Med))) --set if startvalue matches

    IF @YTDVal > 0 and @LimitVal > 0

    BEGIN

    WHILE @adjustedval <> (@LimitVal-@YTDVal)

    BEGIN

    SELECT @NewSDIRate = (@LimitVal-@YTDVal)/@adNetval

    SELECT @adNetval = ROUND(@NetBenefit*(1/(1-(@NewSDIRate+@Fed+@staterate+@Soc+@Med))),2)

    SELECT @adjustedval = ROUND(@NewSDIRate * @adNetval,2)

    END

    SELECT @Fed1 = round(@Fed * @adNetval,2)

    SELECT @Med1 = ROUND(@Med * @adNetval,2)

    SELECT @staterate1 = ROUND(@staterate * @adNetval,2)

    SELECT @Soc1 = ROUND(@Soc * @adNetval,2)

    END

    -- Return the result of the function

    insert @TaxLimit

    select

    @NewSDIRate,

    @adNetval,

    @adjustedval,

    @Fed1,

    @Med1,

    @staterate1,

    @Soc1

    RETURN

    END

    GO

    Thank you,

    Jeff

  • I created my FIRST Cursor yesterday. I had to read up on how to do it. It was obvious this was creating a RBAR process as I watched it run. I'm looking forward to your follow-up articles and maybe I can go back and put a stake in that thing's heart.

  • I find that most times I end up using a cursor, its because I'm doing something that would be better done via a regular program, like an emailer or some other non data processing action.

    I think its one of those cases where you use what you are familiar with. An example, our C# dev asked me if I had a tool for comparing data differences in identical tables. I did not, but I wrote one, using just no loops but a bit of dynamics sql. He turned out a C# program within the same time frame. Mine ran quicker, but goes to show there are many ways to achieve the same result.

    Have a great weekend everybody!

  • Thomas-282729 (6/25/2010)


    DBA: A. Why is it my problem that someone else did not do a better job of planning ahead? B. why are you pulling one record at a time instead of pulling a batch of records? C. Why not write a small app that batches and processes the data?

    I agree, I've been there, so I can give you the responses.

    a. Because you're holding the project up. It would be on-schedule if you didn't just let the developers write the code. there's no Right and Wrong answer. Just write an SP to do it, and the project can get delivered.

    b. We're not. We're batch processing the records, however each row requires handling in a different, and complex way depending on the data which 'hangs off' it. There are up to 300 different 'events' which can fire depending on the processing logic.

    c. that would require development time which we don't have. It wasn't specified *whinge whinge*

    In defence of point c, The app option is a red herring anyway, cos it would be external to the DB, and would still have to pore through each row, do the very specific logic to each row, requiring the pulling-in of yet more data. The pull/process/push time would be rather extensive. Even with 1000 simultaneous connections (assuming the DB didn't fall over due to locking on the tables), it'd still take more than 10 times longer to complete that way, and would be yet another service to have to monitor, document, maintain, etc.

    Thomas-282729 (6/25/2010)


    IMHO

    I think your opinion is right in the absolute sense, but sometimes is not possible from a physical, conceptual or practical viewpoint.

    Cheers,

    Nick

  • Nick Walton (6/25/2010)


    Thomas-282729 (6/25/2010)


    There is an answer, but you probably won't like it: don't do it in a stored procedure.

    If you cannot accomplish a given piece of work in a set-based format, it was probably not meant to be done in a database. Stored procedures were not meant to encapsulate complex business logic of the type you suggest. Databases are first and foremost meant to serve up data. Further, if there is so much data that streaming it to an external service is impractical, then I would suggest that it is also impractical to do it row-by-row in a stored proc. If you can do it row-by-row in a proc, then you can do it row-by-row in an external service.

    Thanks for the response. I agree, in general, but until you've worked with SmartClients, remote connectors, WFC, Web Services, and all that other supposedly "good" stuff Microsoft is spewing out on the dev community, you've not had to have the following argument with developers:-

    Dev: "We need to put this logic in the Database"

    DBA: "That's business logic, and should live in the service BL Layer"

    Dev: "We must pull out all billing and banking records for each ident. Each connection takes around one tenth of a second, and therefore can take around three seconds per record to process. To process the whole database will take around sixty-four days".

    DBA: "Is there no faster way to do this?"

    Dev: "Possibly, but we have a deadline, and it's almost the end of the <insert financial deadline here>".

    DBA: "So what's your solution?"

    Dev: "I will pass in an XML file with a full list of all the record IDs which need to be processed, and the database will expand it and process each of the rows, and return a total."

    DBA: "No, that's silly, and will require a Cursor, which is *bad*"

    Dev: "You'll have to explain to the customer why they won't be able to collect their <insert financial term>, they will lose about eight million pounds."

    .

    .

    rinse repeat.

    There are many occasions where the slide into poor practice is unavoidable or just plain impossible by export/import. A cursor achieves the above two-month operation in around an hour (still too slow for the customers, but hey, it's better than 64-days).

    Anyway, this is a very specific issue, and I'm interested to see how I can get rid of cursors effectively for such things....

    DBA: "Don't tell the customer to go on a diet just yet. Maybe we could pass in a table-valued parameter with all the customer IDs and just do a simple INNER JOIN to get *all* our data back?"

    Dev: "But we're not on SQL 2008."

    DBA: "Then pass in the IDs as XML and use the XML data type nodes method to shred it to relational data. Perform INNER JOIN."

    Dev: "But XML is so complicated!"

    DBA: "Google Erland Sommarskog's article 'Arrays and Lists in SQL Server' and learn how to pass it in as a delimited list/simulated array. You have used delimited lists and arrays, right?"

    ...

    I love these philosophical thought experiments!

    Thanks

    Mike C

  • Colin Betteley (6/25/2010)


    I have seen a number of articles recently on moving away from cursors and fully understand the performance benefits in doing so.

    That said, I use cursors as I need to get data, manipulate it and then send specific Emails based on that manipulated data. At the moment I cannot see a way of doing this in SQL other than by using cursors.

    An article on this particular aspect would be most welcome to me and I suspect a large number of others.

    Thank you in advance.

    Colin

    Colin:

    I have heard this before (that is, needing cursors for customized merge-mailing from SQL Server) and I would love to address it, but in order to do so, I need concrete examples to work with. Despite several request on my part, I have yet to receive any such example that I can test and compare to a non-cursor solution.

    If you could the data and could to demonstrate your cursor Email solution, then I would be happry to include my analysis and (attempted) solution in a future article.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • xor 52318 (6/25/2010)


    The topic of the series is interesting and timely, but the writing style in the introduction is goofy. It could have been replaced by a single sentence:"Cursors are slow and resource intensive, but there is a better way."

    Read back in this thread, I've already addressed this nit before. Short answer is: your approach has been tried before by many fine writers, yet has failed to reach many who need to hear this message. I have used a different approach this time in order to reach some who may have passed over a more standard introduction.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 346 through 360 (of 380 total)

You must be logged in to reply to this topic. Login to reply