Indexed Views

  • amenjonathan (11/17/2010)


    Indexed views and triggers: two things to avoid if at all possible. Next up: Cursors are good! 😀

    Please explain.

    Or, if we are just throwing terms without any backing, I'll offer:

    * Indexed views, when used with care, are an invaluable performance tool.

    * Triggers are great to guard consistency, but only when standard constraints fall short.

    * Cursors, though heavily overused by new users, certainly have their place in the expert's tool box.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (11/17/2010)


    amenjonathan (11/17/2010)


    Indexed views and triggers: two things to avoid if at all possible. Next up: Cursors are good! 😀

    Please explain.

    Or, if we are just throwing terms without any backing, I'll offer:

    * Indexed views, when used with care, are an invaluable performance tool.

    * Triggers are great to guard consistency, but only when standard constraints fall short.

    * Cursors, though heavily overused by new users, certainly have their place in the expert's tool box.

    Every tool has its use. Some tools should not be used unless there is no other way, either because using these tools violates some larger no-no like data duplication or because they are resource hogs and there are generally better ways to get the same task accomplished. In fact, I would actually prefer changing the structure of the system to avoid their use altogether if that is possible. Indexed views are data duplication. It is to be avoided unless there is no other way.

    What I meant in my original post is we're seeing all these articles lately about these crappy tools that should really be avoided. Hardly any nay-saying about them. They are only good for 1 or 2 very specific purposes and nothing else. If I were new to the industry I might run out and create a bunch of indexed views (data duplication NO NO NO!), triggers (Good luck changing that schema!), and cursors (I eat all your resources for lunch!). These three things are garbage, unless it truly is the only alternative. Cursors specifically I've never seen a solution using one that was faster or better performing than a set based solution or simply using a while loop. In fact they're so bad, I don't even know the syntax for how to create one!

    A funny side note, I was asked in an interview once what I thought about cursors. I said well I feel they're so crappy I don't even know how to write one. There's always a better way than using a cursor. That was one of the reasons I was hired.

    It's good to have a solid set of generalizations and things to avoid if at all possible. That would be a great article. The past two articles on indexed views and triggers sound like they come from a viewpoint that these are generally acceptable solutions without any conditions on use. But in reality these two things are tools that are last on the list of alternative solutions, not for use in every day development. Cursors I truly believe can be avoided altogether.

    -------------------------------------------------------------------------------------------------
    My SQL Server Blog

  • >>>Cursors I truly believe can be avoided altogether.

    An example of when a cursor is necessary: when interacting with a SQL-based API that encapsulates business logic (Microsoft Dynamics GP for example). Am I going to directly insert inventory adjustment transaction data into base tables in a set-based fashion? Not likely. Instead, I'll call the vendor's stored proc to insert a single transaction--even if I need to loop through a thousand source rows to call the procedure one at a time. (The benefits of encapsulation sometimes outweigh the performance and elegance of set-based operations.)

    Another example would be complex recursive operations where the outcome is affected by earlier passes in the recursion (i.e. things that you can't do with a simple recursive CTE). Sometimes you do in fact need to walk through data a row at a time.

    By all means, cursors should be avoided, set-based processing is the way to go, and for the vast majority of data operations you can avoid cursors. But I'd argue that cursors cannot always be avoided.

    The same principle is true of indexed views. Yes, they duplicate data. Yes, they can often be avoided. Yes, they can be abused. But they are very helpful when used properly, and they can solve problems that you can't easily solve other ways.

  • I always use a while loop if I need to process RBAR.

    -------------------------------------------------------------------------------------------------
    My SQL Server Blog

  • amenjonathan (11/18/2010)


    Indexed views are data duplication. It is to be avoided unless there is no other way.

    Indexes are data duplication as well. Do you avoid them as well?

    In both cases, you duplicate some data under total control of the DBMS, and accept the performance hit for modifications in exchange for the performance gain for retrieval.

    There are many scenarios where indexed views are an invaluable tool. Frequently retrieved aggregates is the prime, but not sole, example. Sure, people should no more go overboard with indexed views as with indexes; I already made a comment about this in one of my first posts in this topic. But your telling people to avoid them if at all possible is eequally as bad as the article suggesting to use them for every other query. The truth is somewhere in the middle - know about their existence; use them when appropriate.

    Cursors specifically I've never seen a solution using one that was faster or better performing than a set based solution or simply using a while loop. In fact they're so bad, I don't even know the syntax for how to create one!

    Yes, it shows. If you had ever bothered to check the syntax, and to actually study the subject, you would know that the default options for cursors are cappy, but that a cursor with the correct options will always be faster than the WHILE loops you use for RBAR.

    There's always a better way than using a cursor.

    Really?

    Here's my challenge, for an online auction database with auto-bid functionality.

    CREATE TABLE Auctions

    (AuctionID int NOT NULL,

    OpeningPrice decimal(9,2) NOT NULL,

    MinimumIncrement decimal(9,2) NOT NULL DEFAULT 0.01,

    CloseTime datetime NOT NULL,

    PRIMARY KEY (AuctionID)

    );

    CREATE TABLE Bidders

    (BidderID int NOT NULL,

    PRIMARY KEY (BidderID)

    );

    CREATE TABLE AutoBids

    (BidderID int NOT NULL,

    AuctionID int NOT NULL,

    StartBid decimal(9,2) NOT NULL DEFAULT 1.00,

    Increment decimal(9,2) NOT NULL DEFAULT 1.00,

    MaxBid decimal(9,2) NOT NULL,

    PlacedAt datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,

    StartTime datetime NULL,

    PRIMARY KEY (BidderID, AuctionID),

    FOREIGN KEY (BidderID) REFERENCES Bidders,

    FOREIGN KEY (AuctionID) REFERENCES Auctions

    );

    CREATE TABLE Bids

    (BidderID int NOT NULL,

    AuctionID int NOT NULL,

    PlacedAt datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,

    Amount decimal(9,2) NOT NULL,

    ComesFromAutoBid char(1) NOT NULL DEFAULT 'N'

    PRIMARY KEY (BidderID, AuctionID, PlacedAt),

    FOREIGN KEY (BidderID) REFERENCES Bidders,

    FOREIGN KEY (AuctionID) REFERENCES Auctions,

    CHECK (ComesFromAutoBid IN ('Y', 'N'))

    );

    Bidders can enter a bid directly in the Bids table, or they can place a bidding instruction in the auto-bid table. The system will then automatically bid on their behalf - every minute (but not before StartTime, if specified), the system will bet the lowest of starting amount, current highest bid plus the auto-bids increment, or current highest bid plus the auctions minimum increment. Except, of course, if the bidder is already the highest bidder, or if the amount to be bid would exceed the maximum bid.

    To support the auto-bid process, a procedure is run once every minute. This procedure should check each auto-bids once, in order of the moment they were placed, and generate a bid if the requirements are met. During the processing of auto-bids, no manual bids are accepted (this requirement is implemented by placing a table lock on the Bids table) - but we don't want customers to notice this, so processing time should be as short as possible.

    I would choose a cursor here. I'm curious to see your better solution.

    (EDIT: Added a missing closing parenthesis in the code block)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • @hugo:

    I'm not an expert in SQL. I was also under the impression that Cursors are a big NO and hence should be avoided at all costs. I'm only saying that after reading numerous articles which states that. Now I'm really curious to see your explanation and to learn why Cursors may be better in some cases.

    The situation that you've mentioned here is quite interesting. Would you also be posting the cursor solution for this?

  • puja63 (11/18/2010)


    @Hugo:

    I'm not an expert in SQL. I was also under the impression that Cursors are a big NO and hence should be avoided at all costs. I'm only saying that after reading numerous articles which states that. Now I'm really curious to see your explanation and to learn why Cursors may be better in some cases.

    The situation that you've mentioned here is quite interesting. Would you also be posting the cursor solution for this?

    Hi Puja63,

    First, let me reinforce that "avoid cursors" is a good rule of thumb. This is not true of all databases, but SQL Server is heavily optimized for set-based processing. In 99.9% (estimation backed only by personal observation) of all cases, a set-based solution runs rings around any row-based attempt. (To avoid misunderstanding - set-based means that one [or possibly a few] queries are used that process all qualifying rows at once; row-based means that a cursor or other technique is used that to process qualifying rows one by one - on this site, the term RBAR [Row By Agoniziing Row], coined and popularized by Jeff Moden, is often used for row-based).

    In practice, most cursors are not implemented after considering possible set-based alternatives, but because a developer with a 3GL backgrund found that instrument to fit the mindset he/she is accustomed to. Not newbie developers, but newbie SQL developers. I'd say that a cursor is actually an instrument that should be reserved for the experts. Not because coding a cursor is hard, but because deciding when this is indeed the best alternative is hard.

    The problem is that the "avoid cursors" advise is often misunderstoood and misreprersented in two ways, And both have occured in this discussion.

    1) Some people extend the advise to "avoid row-based processing always and in all circumstances". That does no justice to the 0.1% of cases where the cursor is indeed the best alternative.

    2) And some people take the "avoid cursors" too literally, without realising that it actually means to "avoid all types of row-based processing". They replace a cursor by a WHILE loop that basically does the same, thinking they have followed best practice. And if they actually do performance measurements (most probably don't), they might even feel reinforced. However, that is not because a WHILE loop performs better than a cursor, but only because a WHILE loop (if coded with care) performs better than a cursor with default cursor options.

    A few years back, I have compared all possible cursor options to find the performance differences between the various possiblities. If you are interested, then please read these articles:

    * Curious cursor optimization options

    * Poor men see sharp - more cursor optimization

    Note however that I later found that my research was incomplete. My conclusion that FAST_FORWARD is never faster than a STATIC FORWARD_ONLY READ_ONLY cursor is only correct for cursors that process an amount of data that fits entirely in the available cache. As soon as the amount of data processed by the query exceeds available cache, FAST_FORWARD starts to shine. I guess I should take the time to do an update of that article.

    I'd say that the cursor code for the challenge posted should be pretty straight-forward. But since you ask, I'll post the code. Just not now, I'm a bit pressed for time. Feel free to remind me if you think I've forgotten my promise.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • I guess I will concede to your view of cursors because I don't have time to take your challenge. I have read many articles on cursor vs set vs while loop. In each article, the cursor loses. I would say I can't conclude that in every case a while loop is better than a cursor, but I would say that there is always a way to turn something into a set based solution, even if you have to use # tables to do it. I would imagine this depends on how many rows you need to process. There's a great set of articles on cursor vs set (and ... maybe while loop, can't remember) done by Itzik Ben-Gan from SQL Server Mag a few months back. He basically finds a way to turn the schema into something that works with set.

    I was trying to find articles on how a cursor actually works as far as memory allocation and other junk a cursor does, and I couldn't find any with the 10 or so minutes I have to reply. I did find an article that states a cursor locks the entire set of data it's set up to work with. A while loop does not do this. There's one benefit a while loop has. The article, although possibly not an authority on SQL - I have no idea, is here: http://www.sqlbook.com/SQL/Avoiding-using-SQL-Cursors-20.aspx.

    Hugo if you could supply some links on how cursors work behind the scenes I would much appreciate it. It should be easy to judge if the cursor is more heavy handed than the while loop which is better to use. Currently I have other fish to fry such as implementing the MERGE with single row processing on error found in this previous article: http://www.sqlservercentral.com/articles/Database+Design/70796/[/url], which I plan on using a while loop to accomplish. Also I need to tackle data collector, RML utilities, buffing up on SSAS...basically studying cursors is really at the bottom of my priority list, fortunately or unfortunately, depeding on your perception of cursors.

    EDIT: had to fix my links

    -------------------------------------------------------------------------------------------------
    My SQL Server Blog

  • amenjonathan (11/19/2010)


    I guess I will concede to your view of cursors because I don't have time to take your challenge. I have read many articles on cursor vs set vs while loop. In each article, the cursor loses. I would say I can't conclude that in every case a while loop is better than a cursor, but I would say that there is always a way to turn something into a set based solution, even if you have to use # tables to do it. I would imagine this depends on how many rows you need to process. There's a great set of articles on cursor vs set (and ... maybe while loop, can't remember) done by Itzik Ben-Gan from SQL Server Mag a few months back. He basically finds a way to turn the schema into something that works with set.

    If you can turn a row-based solution in a set-based solution, you (nearly) always gain performance. That is absolutely true. And Itzik is a true genius in finding creative ways to combine SQL Server's features to let it do amazing things. But I am sure that in some cases, an efficient set-based solution is simply impossible, even for the likes of Itzik.

    I was trying to find articles on how a cursor actually works as far as memory allocation and other junk a cursor does, and I couldn't find any with the 10 or so minutes I have to reply. I did find an article that states a cursor locks the entire set of data it's set up to work with. A while loop does not do this. There's one benefit a while loop has. The article, although possibly not an authority on SQL - I have no idea, is here: http://www.sqlbook.com/SQL/Avoiding-using-SQL-Cursors-20.aspx.

    I have just finished reading the article you mention. I have no idea who wrote it either, but I can tell you that it is seriously flawed.

    1) Cursors blocking tables? My foot!

    Since two of the three major cursor types are designed to be able to respond to changes made to the table while processing the cursor, and the third one works under the cover very similar to the temp table/WHILE approach, I can assure you that a cursor should not lock the entire table, and definitely not during the entire processing. If you, or everyone else, ever saw that happening, the cause must have been either bad code, or a bug in the product. It just happens that I was involved in another discussion earlier this month, where someone else ("ramireddy") and I both posted some sample code to show the effect of cursor options on concurrent modifications. None of these involve blocking. The link to this discussion is below.

    2) Efficient code? Nope.

    The sample code presented is awful. The cursor code uses default options - I have already told you that those are bad news. Yes, you can beat performance of THAT cursor with a WHILE. Just like you can also go faster in a 2CV than in a Lamborghini - if you step out of the Lamborghini and push, but use the engine of the 2CV.

    And the WHILE code is equally awful. No PRIMARY KEY on the table. This will result in a full table scan for each iteration of the loop. Try this with a ten-million row result set, and weep as your server grinds to a halt.

    3) User-defined functions? Please not.

    The "alternative" to use user-defined functions is almost worse. The optimizer has no choice but to call the UDF once for each qualifying row. And then access the table used in the UDF for that row. Performance will seriously suffer. For the case presented, the better alternative would have been to join the Customer table with a derived table or CTE that aggregates the Sales, and then use a CASE expression to find the discount.

    I tried to check the forum discussion, but found the link to be broken. But please, don't take any of the presented advise serious. Not everything in that article is incorrect, but enough is to warrant discarding it as a whole.

    Hugo if you could supply some links on how cursors work behind the scenes I would much appreciate it. It should be easy to judge if the cursor is more heavy handed than the while loop which is better to use.

    In my previous post, I already gave the links to my two blog posts on cursors - again, they should be corrected, as the results presented are only accurate for queries that process less data than fits in cache.

    The discussion I mentioned before, on Microsoft's MSDN forums, includes some background on the internals of query processing, and links to other articles with more background. (I don't agree for 100% with those articles, which is why I contributed to the discussion as well).

    You'll find it at http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/13a38f34-ec01-48c5-928a-24d95bdd1fb4/[/url].

    Note that some of the posts made at the start of the discussion are not entirely accurate, so please read the entire discussion.

    Currently I have other fish to fry such as implementing the MERGE with single row processing on error found in this previous article: http://www.sqlservercentral.com/articles/Database+Design/70796/[/url], which I plan on using a while loop to accomplish.

    That link takes me to an article about compression. But I guess that it would be off-topic for this discussion anyway. 🙂

    Good luck with all your other tasks. I can't really find fault with your priorities - but do save the links I provided, in case you ever find yourself in a sitution where you do need to implement row-based processing.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Ah here's the appropriate MERGE link. I've just built my version of the sproc, but am still testing it.

    http://www.sqlservercentral.com/articles/MERGE/71396/[/url]

    Thanks for the links. I'll definitely check them out.

    Personally I still don't like cursors. Most of the time I can find a set-based solution. If not, a while loop just seems easier. Maybe if one of my while loops comes up as a hog during resource monitoring I'll check cursors out more. Until then (and other fish are fried and eaten), no cursors for me and hopefully not from my co-workers.

    -------------------------------------------------------------------------------------------------
    My SQL Server Blog

  • In the rare cases when I've felt the need for RBAR I've found that in some cases cursors are faster than WHILE loops but only when I don't have to worry about concurrency.

  • I'll offer:

    * Indexed views, when used with care, are an invaluable performance tool.

    * Triggers are great to guard consistency, but only when standard constraints fall short.

    * Cursors, though heavily overused by new users, certainly have their place in the expert's tool box.

    Well stated. I have yet to need a proper cursor (some consider the WHILE loop a cursor, and I use it to populate my date dimensions) but I have it in the back of my mind. I've seen too many use triggers and cursors w/o regard for other possibilities, ultimately to the detriment of the system.

  • Hi

    Indexed views can improve performance significantly if you have queries that combine large volumes of data with aggregates. It should not be created against tables where lot of modifications or changes are happening. This is because indexed views have to be maintained when there are changes happening to base data.

    The large volumes of data is not the only precondition of using the indexed views. I met a case where a view didn't have a large volume of data, but was created on many tables (many joins) and in that case Indexed views show a very good performance too.

    Thanks in any case for your article.

    Regards

    IgorMi

    Igor Micev,My blog: www.igormicev.com

  • Thank You so much for this article.

    It indirectly solved a long standing performance problem I have with a third party application.

    There are 2 queries with useless Where clauses with likes and <> ''

    and an order by clause on LEN(fieldname) Desc.

    The queries have no chance of using any index at all. The table has grown to over 80,000 records. The procedure executes around 20,000 each day.

    I had suggested adding a computed column to the table many months ago but was told we are not allowed to change the source.

    So after reading your article I created a view with schemabinding on the table in question only on the required fields for the queries and added the computed column LEN(FieldName)

    Then created an unique clustered index on the computed column of the view.

    Overall cost of the query has dropped from 3.5998 to 0.00315. Wow it has saved us 30% to 50% CPU since implementing the view. And best of all I have not touched their code!

    Now do I bother telling the Software vendor???

    Regards

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply