﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Article Discussions / Article Discussions by Author / Discuss content posted by Francis Rodrigues  / ROW_NUMBER(): An Efficient Alternative to Subqueries / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Thu, 23 May 2013 17:39:02 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[code="sql"]-- how about?SELECT	  pv.ProductID	, pv.Version	, pv.MinorVersion	, pv.ReleaseVersion	, pv.StandardCostFROM Production.ProductVersion pv WITH (NOLOCK)WHERE EXISTS (	SELECT *	FROM (		SELECT TOP 1			  pv2.ProductID			, pv2.Version			, pv2.MinorVersion			, pv2.ReleaseVersion		FROM Production.ProductVersion pv2 WITH (NOLOCK)		WHERE pv2.ProductID = pv.ProductID		ORDER BY			  pv2.ProductID DESC			, pv2.Version DESC			, pv2.MinorVersion DESC			, pv2.ReleaseVersion DESC	) pv2	WHERE pv2.ProductID = pv.ProductID	AND pv2.Version = pv.Version	AND pv2.MinorVersion = pv.MinorVersion	AND pv2.ReleaseVersion = pv.ReleaseVersion)[/code]</description><pubDate>Fri, 18 Jun 2010 15:26:11 GMT</pubDate><dc:creator>Brian Barkauskas</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote]It's an interesting blog entry though isn't it?  Makes you think.[/quote]You bet it is!However, common sense has always told me that an index should provide a fast way of locating a record it points to (or otherwise it would become unmaintainable), and storing the record's pointer along with the index keys (be it [b]RID[/b] or a clustered key) is the best way to achieve this.I know for certain that [b]InnoDB[/b] does it, but [b]InnoDB[/b] tables, being [b]B-Trees[/b] rather than [b]B+Trees[/b], do not maintain direct links between leaf-level pages (and do not even distinguish between key-level and leaf-level pages).I was just seeking for a proof that [b]SQL Server[/b] does the same, and, thanks to you, I got it now :)</description><pubDate>Sun, 07 Feb 2010 15:37:46 GMT</pubDate><dc:creator>Quassnoi</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]Quassnoi (2/7/2010)[/b][hr]The reference you mentioned explicitly states that when a secondary index is not declared [b]UNIQUE[/b], it contains the values of the clustered key in the key-level pages:[/quote]:laugh: Typical.  Oh well, my mistake there then - sorry about that.  I should have checked which way round it was.It's an interesting blog entry though isn't it?  Makes you think.</description><pubDate>Sun, 07 Feb 2010 13:58:34 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>Paul,The reference you mentioned explicitly states that when a secondary index is not declared [b]UNIQUE[/b], it contains the values of the clustered key in the key-level pages:[quote]Now you should see something different. The first index, nonunique, still has all three columns in the upper level page: the nonclustered key SalesOrderDetailID, and the two columns of the clustered key: SalesOrderID and the uniqueifier.[/quote]</description><pubDate>Sun, 07 Feb 2010 04:05:52 GMT</pubDate><dc:creator>Quassnoi</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>Quassnoi,I don't have time this morning for a full reply, and I'm off around the island for the next few days, so this will be just a quick one.  I promise a fuller reply when I get back.The reference you need is [url]http://sqlblog.com/blogs/kalen_delaney/archive/2008/03/16/nonclustered-index-keys.aspx[/url] - a blog entry by Kalen Delaney.The point about the Segement Top is that it is very fast on all scenarios, and doesn't require any special tricks or coding to just work well.  It also works for the other cases in your blog entry too (without ties etc).The problem with the other methods in your blog entry is that they depend on data distribution, and perform poorly if the distribution is 'wrong'.Naturally, if one knows something about the data distribution, and can [i]guarantee[/i] that it will never change, it might be worthwhile considering a hand-coded 'trick' like the recursive CTE for extremely large data sets, though I think this is very much an edge case to be honest, and might be too fragile for production use.I think a general method with very good and [i]predictable[/i] performance that uses the statistics available to the optimizer to produce an appropriate plan over a wide set of circumstances should be preferred in the general case.  Segment Top provides such a method for this class of query, and you really should include it in your blog and tests.I have many other things to say on the specifics of your last reply, but they will have to wait.CheersPaul</description><pubDate>Sat, 06 Feb 2010 16:06:29 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>Hi Paul.[quote]Please consider the following code, using the sample data from your blog:[code="sql"]WITH    RowIDsAS      (        SELECT  DD.id        FROM    [20091201_ties].t_distinct DD        WHERE   DD.lorderer =                (                SELECT  MIN(DI.lorderer)                FROM    [20091201_ties].t_distinct DI                WHERE   DI.glow = DD.glow                )        )SELECT  COUNT(*),         SUM(LEN(LookUp.stuffing))FROM    RowIDsCROSSAPPLY   (        SELECT  stuffing         FROM    [20091201_ties].t_distinct TD        WHERE   TD.id = RowIDs.id        ) LookUp;[/code][/quote]On my 2005 installation, this runs slower than DISTINCT:[code="plain"]SQL Server parse and compile time:    CPU time = 0 ms, elapsed time = 1 ms.(строк обработано: 1)Table 't_distinct'. Scan count 11, logical reads 2250, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:   CPU time = 250 ms,  elapsed time = 241 ms.[/code]for [b]DISTINCT[/b],[code="plain"]SQL Server parse and compile time:    CPU time = 0 ms, elapsed time = 1 ms.(строк обработано: 1)Table 't_distinct'. Scan count 1, logical reads 2188, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:   CPU time = 359 ms,  elapsed time = 356 ms.[/code]for your query.I created the index with [b]id[/b] explicitly added, and it's used in both queries. The optimizer built the same plan as shown in your post.Segment operator still requires a full index scan which is costly. The sample table contains only a million records and the whole index fits into the cache, so the performance difference it not so obvious, but as the index size grows, the extra physical reads make the query perform much worse.CTE query was intended to get rid of the full index scans. It completes in only [b]IndexDepth[/b] page reads per distinct record. Even for billion rows, the index depth will be around 6, and with 100 groupers, the same query would compete in 600 page reads from the index. Even if all these reads are physical reads, this is not that much.Of course, having a dedicated table instead of the CTE, as Matt suggested, will improve the query yet a little more.[quote]One other thing.  I notice that the index definitions given in your blog effectively INCLUDE the id column since that is the clustering key for the table.  The id therefore only exists at the leaf level of the index, which makes it unavailable at higher levels.  (As an aside, the name of the index seems to imply that id is a key column, rather than an include.)[/quote]I've heard this before, but my googlefu was not enough for me to be able to find any reference which proves or disproves it :)But let us conduct a little experiment:[code="sql"]SET NOCOUNT OFFCREATE TABLE t_clustered (        id INT NOT NULL,        val INT NOT NULL,        lid INT NOT NULL,        CONSTRAINT PK_clustered_id PRIMARY KEY (id),        CONSTRAINT UX_clustered_lid UNIQUE (lid)        )CREATE INDEX ix_clustered_val ON t_clustered (val)CREATE INDEX ix_clustered_val__lid ON t_clustered (val) INCLUDE (lid)GOWITH    q (id) AS        (        SELECT  1        UNION ALL        SELECT  id + 1        FROM    q        WHERE   id &amp;lt; 1000000        )INSERTINTO    t_clustered (id, val, lid)SELECT  id, 1, idFROM    qOPTION (MAXRECURSION 0)[/code]This table has 1,000,000 records, clustered PK on id and two indexes: one on [b]val[/b] only, another one on [b]val[/b] covering [b]lid[/b].[b]lid[/b] holds the same values as [b]id[/b], and is marked [b]UNIQUE[/b]Val is the only declared key column in both indexes, and the only value of val is 1.Let us issue the following query which searches both for val and lid:[code="sql"]SELECT  val, lidFROM    t_clustered WITH (INDEX (ix_clustered_val__lid))WHERE   val = 1        AND lid = 987654[/code], which forces use of the index, with the following plan:[code="plain"]  |--Index Seek(OBJECT:([test].[dbo].[t_clustered].[ix_clustered_val__lid]), SEEK:([test].[dbo].[t_clustered].[val]=(1)),  WHERE:([test].[dbo].[t_clustered].[lid]=(987654)) ORDERED FORWARD)[/code]Formally, it's an index seek, but since val = 1 holds for every record in the table and [b]lid[/b] is not included in the key, the engine had to do the full index scan which required reading of all index pages:[code="plain"]SQL Server parse and compile time:    CPU time = 2 ms, elapsed time = 2 ms.(строк обработано: 1)Table 't_clustered'. Scan count 1, logical reads 1862, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:   CPU time = 62 ms,  elapsed time = 75 ms.[/code]But if we issue a similar query against [b](val, id)[/b][code="sql"]SELECT  val, idFROM    t_clustered WITH (INDEX (ix_clustered_val))WHERE   val = 1        AND id = 987654[/code], we get this plan:[code="plain"]  |--Index Seek(OBJECT:([test].[dbo].[t_clustered].[ix_clustered_val]), SEEK:([test].[dbo].[t_clustered].[val]=(1) AND [test].[dbo].[t_clustered].[id]=(987654)) ORDERED FORWARD)[/code], and the query completes in 3 page reads (which is the index depth):[code="plain"]SQL Server parse and compile time:    CPU time = 2 ms, elapsed time = 2 ms.(строк обработано: 1)Table 't_clustered'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:   CPU time = 0 ms,  elapsed time = 1 ms.[/code][b]id[/b], I remind, is not declared a part of the index, but we see that it is used by the Seek operator against the index, and the query does not have to fetch all index pages. Somehow, 3 page reads is enough for the engine to locate the required record.I cannot figure out how would this be possible if [b]id[/b] were not a part of the key.Could you please provide any reference that would show the key layout of a secondary index in a clustered table?</description><pubDate>Sat, 06 Feb 2010 12:46:03 GMT</pubDate><dc:creator>Quassnoi</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>Quassnoi,At the risk of interrupting your flow here, are you aware that you are missing something very important from your 'group-wise min/max ([i]with ties[/i]) discussion?There is a QO transformation, available in both 2005 and 2008, which produces a plan which performs as well as the best methods you show, [i]regardless[/i] of cardinality or distribution.  No recursive CTEs and no analytic functions required.  Excellent performance, without unnecessary complexity.Please consider the following code, using the sample data from your blog:[code]WITH    RowIDsAS      (        SELECT  DD.id        FROM    [20091201_ties].t_distinct DD        WHERE   DD.lorderer =                (                SELECT  MIN(DI.lorderer)                FROM    [20091201_ties].t_distinct DI                WHERE   DI.glow = DD.glow                )        )SELECT  COUNT(*),         SUM(LEN(LookUp.stuffing))FROM    RowIDsCROSSAPPLY   (        SELECT  stuffing         FROM    [20091201_ties].t_distinct TD        WHERE   TD.id = RowIDs.id        ) LookUp;[/code]That code uses the 'lorderer' column.  Just change 'lorderer' to 'orderer' in the CTE for the second run.Here are the actual execution plans for both:[IMG]http://www.sqlservercentral.com/Forums/Attachment5136.aspx[/IMG][IMG]http://www.sqlservercentral.com/Forums/Attachment5137.aspx[/IMG]One other thing.  I notice that the index definitions given in your blog effectively INCLUDE the id column since that is the clustering key for the table.  The id therefore only exists at the leaf level of the index, which makes it unavailable at higher levels.  (As an aside, the name of the index seems to imply that id is a key column, rather than an include.)For the amount of extra space required to store the id at [i]non-leaf levels[/i], you might consider adding the id column explicitly to the index.  Not only can you then mark the index as UNIQUE (which may help the optimizer), you allow the QO more options when considering transformations.The 'segment top' method I demonstrate above requires all referenced columns (in the CTE section) to be explicit keys in the index (and textually referenced in key order).  So, if you wanted to change the query do a MIN or MAX ordered on the *id* column at any stage, it would need to be part of the key, not just an include.CheersPaul(edit to fix graphical plans)</description><pubDate>Sat, 06 Feb 2010 06:54:55 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote]This shouldn't change the resultset at all since products without versions will be eliminated by the CROSS APPLY, but lets assume anyway that every Product has at least one associated ProductVersions record.  If I'm understanding things correctly, this should perform equivalent to the ranking function query for high-cardinality ProductVersions(ProductId), and much better with low-cardinality, since we're avoiding having to build a list of DISTINCT ProductIds.  Is this correct?[/quote]This will be much faster than doing [b]DISTINCT[/b], of course. Actually, that's the reason why the CTE version of the query is faster: instead of grouping all records to build the distinct list it just jumps over the index keys. (This access method is called [b]loose index scan[/b] and MySQL supports it natively, but in SQL Server and other systems you need to resort to the dirty tricks like that to emulate it)The dedicated table with the products will of course speed it up yet a little more.However, [b]CROSS APPLY[/b] with a [b]TOP[/b] implies nested loops and within each loop, it will have to do an index seek for the top-rated record for the current product.An index seek is fast but not instant, it takes [b]O(log n)[/b], where [b]n[/b] is the number or records in the table. If you have [b]m[/b] distinct groupers, the whole query will take [b]O(m &amp;times; log(n))[/b], or [b]O(n &amp;times; log(n) &amp;times; c)[/b], where [b]с[/b] is the cardinality of the column. (By definition, cardinality is [b]m / n[/b])The analytic function, on the other hand, will just need to do a single index scan, which takes [b]O(n)[/b], or [b]O(n &amp;times; const)[/b].This means that for high cardinality columns, where [b]log(n) &amp;times; с &amp;gt; const[/b], the analytic functions will be faster. The actual value of the constant depends on many factors, like the index key size, cache hit ratio etc. However, the cost of a record fetch in a sequential index scan is always less than that of an index seek, that's why [b]const[/b] will be always less than [b]1[/b].Hence, starting from a certain level or cardinality, the analytic functions solution will be more efficient, even with a dedicated product table.For instance, if the products are unique within the ProductVersion, the [b]ROW_NUMBER[/b] query will just return all records, while the [b]CROSS APPLY[/b] query will do the same, but the the overhead of a join an an index seek for each record.In general, for the tables that do not fit completely into the cache, sequential scan over the index becomes even more efficient than random seeks, and the cost of the index seek grows, so the larger are the tables with the same cardinality of the groupers (products), the more beneficial are the analytic functions.[quote]Didn't you mean [b]increasing[/b] efficiency or maybe decreasing [b]cost[/b]?  As the queries became more complex they also executed a lot faster.[/quote]Oh, you're right, I should have been more careful. Thanks for noticing!</description><pubDate>Sat, 06 Feb 2010 05:12:44 GMT</pubDate><dc:creator>Quassnoi</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>Thanks Quassnoi, after considering it for a while that's the conclusion I came to, although I didn't realize that the CROSS APPLY would actually perform [i]worse[/i] for high-cardinality groupings.  I thought it would gradually worsen as the cardinality got higher until it eventually ended up about the same as the ranking functions performance wise.  I guess I was wrong.Your articles are great.  I read the one you put in your reply here and I was pretty amazed at some of those queries.  I thought SQL Server would be smart enough that you wouldn't have to use tricky recursive CTEs and double-APPLYs to explicitly force efficient usage of the index.  When I have some time, I'm going to try testing some of the queries in your articles to see the results firsthand.Here's a question related to the ProductVersion queries in this discussion:[quote]... analytic functions (ROW_NUMBER or DENSE_RANK) are more efficient due to the overhead of building a DISTINCT list ...[/quote]You're saying that the CROSS APPLY query takes a performance hit for high-cardinality data because of the overhead of building a list of DISTINCT ProductIds right?  If that's the case, what if there was a separate table that stored each product (lets say dbo.Products) and instead of querying DISTINCT values from the ProductVersions table, we used all the ProductIds from the Products table.[code="sql"]SELECT  version.*FROM    dbo.Products product        CROSS APPLY (                SELECT  TOP(1) *                FROM    dbo.ProductVersions                WHERE   ProductId = product.Id                ORDER BY Version DESC, MinorVersion DESC, ReleaseVersion DESC        ) version[/code]This shouldn't change the resultset at all since products without versions will be eliminated by the CROSS APPLY, but lets assume anyway that every Product has at least one associated ProductVersions record.  If I'm understanding things correctly, this should perform equivalent to the ranking function query for high-cardinality ProductVersions(ProductId), and much better with low-cardinality, since we're avoiding having to build a list of DISTINCT ProductIds.  Is this correct?-- Edit --Also, in this article:[url]http://explainextended.com/2009/11/30/sql-server-selecting-records-holding-group-wise-maximum/[/url]at the bottom it says:[quote]To select records holding group-wise maximums in SQL Server, the following approaches can be used:    * Analytic functions    * Using DISTINCT / CROSS APPLY / TOP    * Using recursive CTE’s / CROSS APPLY / TOPThese methods are arranged in order of increasing complexity and [b]decreasing efficiency[/b] (for low-cardinality groupers).[/quote]Didn't you mean [b]increasing[/b] efficiency or maybe decreasing [b]cost[/b]?  As the queries became more complex they also executed a lot faster.</description><pubDate>Sat, 06 Feb 2010 01:34:54 GMT</pubDate><dc:creator>Matt Arpidone</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote]And now, most importantly, here are the execution results I got running these queries on my 50 million row table.[/quote]Hi Matt, I just found this discussion in the referrers to my blog.Both [b]CROSS APPLY[/b] and [b]ROW_NUMBER[/b] (or [b]DENSE_RANK[/b] if you want ties) can be more efficient, depending on the cardinality of the field you're grouping (or partitioning) by.For high cardinality fields, analytic functions ([b]ROW_NUMBER[/b] or [b]DENSE_RANK[/b]) are more efficient due to the overhead of building a [b]DISTINCT[/b] list of the values of this field.For low cardinality fields, [b]CROSS APPLY[/b] is a more efficient solution, and that's what your test case shows.You may want to read another article in my blog, which compares both solutions performance-wise, depending on the cardinality of the field in question: [url=http://explainextended.com/2009/12/01/sql-server-selecting-records-holding-group-wise-maximum-with-ties/][b]SQL Server: Selecting records holding group-wise maximum (with ties)[/b][/url]</description><pubDate>Fri, 05 Feb 2010 06:45:12 GMT</pubDate><dc:creator>Quassnoi</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]Matt Arpidone (1/30/2010)[/b][hr]Thanks Mike, I'll keep in mind that any criticism of articles or disagreement will be met with animosity.  I guess my opinion isn't welcome unless I agree with the author, which, as I can clearly see now, makes total sense on a site that welcomes "peer review".I do not know the author at all, nor have I read any of his other work, and I was very careful in my post not to resort to empty ad hominem attacks.  I did express concern about the content of the article, and if you would like to disagree with me on a particular point, feel free.Regarding the testing, I completely understand your concerns, although I find the aggravated tone of your post a little disturbing (maybe I'm reading it wrong, if so I apologize).The reason I did not look up the unit of measure is because units are irrelevant when discussing relative speed differences.  I would rather you not point out my deficiencies as a developer (of which there are many) unless they actually have relevance to my argument.  Certainly, my lack of knowledge about "client statistics" doesn't discredit anything I've posted (unless I've missed something), and furthermore doesn't impair my ability to do my job at all - which makes me curious why you insist that every one of your developers know about them.I am a little worried that I seemed harsh in my post.  I had previously typed up another post (in response to the author's post after mine) about how I thought his title was misleading and that caused me to misjudge the article - along with some additional disagreements - but then I decided to just drop it.  I do still stand by what I've already presented, but I recognize that I seem to have missed the true purpose of the article.Also I find it distasteful that you point out weaknesses in my testing and indicate that you have done your own improved testing, and then you fail to post any details about what you have done.  What do you mean when you say 25-30% faster when tested correctly?  Faster than what?  What do you mean "correctly"?  Personally, I feel that my tests were adequate to get my point across.  If you have numbers that disagree with mine, why don't you post them instead of giving nonconstructive retorts?[quote]In your article you might wish to discuss the internals of how CROSS APPLY works. It might surprise you, and just might change your mind about using a UDF. I'll leave it for you to research CROSS APPLY, as I don't want to pre-empt publication of what I expect to be an excellent article with your name on it in the near future.[/quote]What did you mean here exactly?  As I understand it, APPLY is similar to joining on a correlated subquery.  Actually, I confess that my love of APPLY comes from it being so similar to a foreach loop in a typical programming language.  I don't see what kind of crazy internal things could be going on that make it behave differently.  I did try doing some research on how it works internally, and while I didn't find much helpful material, I did come across this article:  [url]http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/[/url].  It explains a specific scenario where APPLY is more efficient than a query using ROW_NUMBER(), and it explains the minor pitfall with ROW_NUMBER().  I suspect something similar is happening with the product version query.Anyway, I'm not asking you to spoonfeed me a lecture all about how APPLY works, but a link to an article describing this surprising internal behavior would be appreciated since most of the stuff I'm finding gives basic overviews and is pretty much consistent with what I expected.[/quote]Matt, I thought that your response was very enlightening. I would not have thought to use a CROSS APPLY to perform this operation. I have been using RANK(), but will now look into using CROSS APPLY instead. Thanks.</description><pubDate>Mon, 01 Feb 2010 08:19:56 GMT</pubDate><dc:creator>Ted Manasa</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]Mike C (1/30/2010)[/b][hr]Hope that helps with the search for the UDF :)[/quote]Oh right - well thanks for that.</description><pubDate>Sun, 31 Jan 2010 16:58:49 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>"The following figure shows the estimated query costs for the Production.ProductVersion2.  The subquery implementation took 43 seconds to complete where as the ROW_NUMBER() implementation took 5 seconds to complete for 1,000,000 rows."It appears those results are for 100,000 rows, but still 7 times faster.</description><pubDate>Sun, 31 Jan 2010 09:19:08 GMT</pubDate><dc:creator>jbuttery</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]Matt Arpidone (1/30/2010)[/b][hr]What I meant was that in a peer reviewed public forum, criticism should be welcome, even [i]encouraged[/i] as long as it is well structured, helpful, and avoids personal attacks.  I don't understand the tone of your posts towards me because they seem full of angry sarcastic remarks.  And I don't even see anything particularly wrong with that, except that I can't seem to pinpoint exactly what the arguments you're trying to make are.Here is a brief outline of the points I was making in my argument (before I realized that I may have missed the true purpose of the article):[ul][li]The example used in the article was a bad example[/li][li]The first query provided was very poorly written[/li][li]The "solution" query provided was not "efficient" as claimed in the title[/li][li]An alternate could be written, using APPLY (which may be considered by some to be a type of subquery) that is shorter, clearer, and more efficient[/li][/ul]Stating these points alone wouldn't be very helpful, so I supported the "clearer and shorter" claim by writing an example, and I supported the "more efficient" claim by doing some simple tests, explaining my test procedure, and posting the exact results I got from SSMS.Now apparently, this doesn't seem like a structured, constructive, and supported argument to you.  So please help me out here, what exactly is [i]your[/i] argument?I'm going to guess that your main concern is this snippet:[quote]First off, the first query provided was a joke. If anyone wrote a monster like that I would be giving them that look. You know what look I'm talking about, the "Seriously? Who hired you?" look. In fact, I can't believe the author took the time to write something like that for the article.[/quote]Reading this over a couple times I'm starting to agree that this does look like a personal attack.  I honestly didn't mean it that way.  My gut reaction to the query in the article was honestly, "oh god, that's awful", and if I came across something like that in a production environment at work that I had to maintain, I would be upset.  I tried to make some mild humor out of this and obviously failed.  But my point still stands, I believe that query to be very poorly written and a bad example to use in a "before and after" comparison, and I'm not trying to say anything derogatory about the author in general by these statements.And then finally, and I feel like I'm pulling teeth here, can you [i]please[/i] post the details about your testing.  Not some generic statements and sarcastic remarks, but an actual description of your testing procedures and the exact results that you got.  I would like to reproduce them on my end because I'm seriously interested in how the results could be that different.[/quote]I'm not trying to be sarcastic at all, and I'm sorry you took it that way.  Let me tell you a short story to explain my point of view.  I saw an article once that I thought was not so great.  I found several assertions by the author that were not only a little off or on the edge between possibly correct/possibly incorrect -- they were completely and totally wrong.  That's when I decided to write my own article which corrected the information being put out.  You worded your response to this article very strongly so I recommended that if you really felt so strongly about it, submit an article; then wait for the next clever guy to come up with an even better solution :)Now in this article the author began with nested subqueries that unfortunately are all too common!  I've seen that exact same query used by many people in many places (granted they were different source tables, but the exact same concept!)  I agree it is a rough way to go, but I can only assume the author probably ran into something like this in his work--and that he decided to change it.  Believe it or not, that's where 90% of these articles come from--people run across a real-world problem, find a solution, and write it up.  The author then went on to demonstrate one method (the key words here being "one method") of simplifying the code and getting a more efficient query plan versus the nested subqueries.  Note that I said he demonstrated "one method", not "all methods" or "every method".So the question you raised is "are there more efficient ways to get the same results"?  And the answer, as you have shown, is yes.  Your solution is clever, which is why I recommended you take it off the back page and submit an article describing it in more detail for the front page.  However, a new solution does not automatically make another solution "garbage" or "horrible".  Remember there are many ways to accomplish the same thing in SQL/T-SQL--it's one of the major strengths of SQL/T-SQL.However, when you do your actual testing you'll want to (1) make sure you're reporting the statistics that are relevant; (2) make sure you clear buffers and cache to ensure consistent, accurate results.So here's some details (from Books Online) about the statistics you're looking at:Client processing time - The cumulative amount of time that the client spent executing code while the query was executed.Total execution time - The cumulative amount of time (in milliseconds) that the client spent processing while the query was executed, including the time that the client spent waiting for replies from the server as well as the time spent executing code.Wait time on server replies - The cumulative amount of time (in milliseconds) that the client spent while it waited for the server to reply.Notice all time is in milliseconds.  Now, for our purposes since I'm running it on a client local to my server I'll focus in on "wait time on server replies".  This may not be valid if you're doing it over the network since network communication time is going to be rolled up in there also.CROSS APPLY version: -----------------------8296 ms7577 ms7889 ms7920.7 ms (Avg.)ROW_NUMBER() version:--------------------------10232 ms9809 ms10832 ms10291 ms (Avg.)7920.7 ms / 10291 ms = 0.77 = 77%Perhaps even more telling and to the point is the actual query plan, which shows a cost of 14.2601 for your method and a cost of 214.487 for the ROW_NUMBER version.  93% of the cost of the ROW_NUMBER version is in one Sort operation.  Just for snorts and giggles, try your query versus the author's query on a table like the following (you can change your existing table structure to match):CREATE TABLE [Production].[ProductVersion](	[ID] [int] IDENTITY(1,1) NOT NULL,	[ProductID] [int] NOT NULL,	[Version] [int] NOT NULL,	[MinorVersion] [int] NOT NULL,	[ReleaseVersion] [int] NOT NULL,	[StandardCost] [numeric](30, 4) NOT NULL,	CONSTRAINT [PK_ProductVersion] PRIMARY KEY NONCLUSTERED 	(		[ID] ASC	));GOA simple change.  The 4 columns in the table are no longer declared as the PK; instead there is a surrogate key represented by the IDENTITY column.  I'd be interested to know if your test results are the same as mine in this situation.  My tests show that the ROW_NUMBER method is around 3X faster (8518 ms avg. vs. 23094 ms avg.)  I'll let you decide if, with no index on the columns you're sorting on (and yes, I've seen this as well!), whether or not you believe the ROW_NUMBER method is "crappy" compared to your CROSS APPLY method.  CROSS APPLY is not what's speeding up your code, as you'll see once you get rid of the index.  What is speeding up your code is simply the fact that the subquery can take advantage of the covering index on all three columns in this particular scenario.Now, my point is simply this: I believe you missed the whole point of the article -- the author simply showed one method of simplifying and speeding up a convoluted mess of nested correlated subqueries.  Whether you believe anyone alive would have the intestinal fortitude to put queries like that in production, it does happen (I've made a decent living fixing queries like that).  Also, if optimizing queries is part of your day to day activities, you should invest in learning the tools that are available to help you do that job --  you may find that sometimes a big number is not a bad thing.</description><pubDate>Sat, 30 Jan 2010 15:35:01 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>What I meant was that in a peer reviewed public forum, criticism should be welcome, even [i]encouraged[/i] as long as it is well structured, helpful, and avoids personal attacks.  I don't understand the tone of your posts towards me because they seem full of angry sarcastic remarks.  And I don't even see anything particularly wrong with that, except that I can't seem to pinpoint exactly what the arguments you're trying to make are.Here is a brief outline of the points I was making in my argument (before I realized that I may have missed the true purpose of the article):[ul][li]The example used in the article was a bad example[/li][li]The first query provided was very poorly written[/li][li]The "solution" query provided was not "efficient" as claimed in the title[/li][li]An alternate could be written, using APPLY (which may be considered by some to be a type of subquery) that is shorter, clearer, and more efficient[/li][/ul]Stating these points alone wouldn't be very helpful, so I supported the "clearer and shorter" claim by writing an example, and I supported the "more efficient" claim by doing some simple tests, explaining my test procedure, and posting the exact results I got from SSMS.Now apparently, this doesn't seem like a structured, constructive, and supported argument to you.  So please help me out here, what exactly is [i]your[/i] argument?I'm going to guess that your main concern is this snippet:[quote]First off, the first query provided was a joke. If anyone wrote a monster like that I would be giving them that look. You know what look I'm talking about, the "Seriously? Who hired you?" look. In fact, I can't believe the author took the time to write something like that for the article.[/quote]Reading this over a couple times I'm starting to agree that this does look like a personal attack.  I honestly didn't mean it that way.  My gut reaction to the query in the article was honestly, "oh god, that's awful", and if I came across something like that in a production environment at work that I had to maintain, I would be upset.  I tried to make some mild humor out of this and obviously failed.  But my point still stands, I believe that query to be very poorly written and a bad example to use in a "before and after" comparison, and I'm not trying to say anything derogatory about the author in general by these statements.And then finally, and I feel like I'm pulling teeth here, can you [i]please[/i] post the details about your testing.  Not some generic statements and sarcastic remarks, but an actual description of your testing procedures and the exact results that you got.  I would like to reproduce them on my end because I'm seriously interested in how the results could be that different.</description><pubDate>Sat, 30 Jan 2010 14:28:03 GMT</pubDate><dc:creator>Matt Arpidone</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]Paul White (1/29/2010)[/b][hr][quote][b]Mike C (1/29/2010)[/b][hr]What Tony is referring to is the fact that the CTE regenerates the GUID every time the "rn" column is referenced from the CTE.  This is a bug.  The GUID should be generated every time NEWID is referenced.  NEWID is only directly referenced once (in the CTE, by the ROW_NUMBER function).  Run Tony's example and look at the two "rn" columns.  They should match up with one another in every row.[/quote]Yes I know - and I disagree, and previously explained why.  Read Microsoft's replies to the Connect item entered by Itzik Ben-Gan for a fuller explanation of why this is not a bug - it's a sensible design decision.[/quote]I'm familiar with that bug.  The way I read it, even MS agrees it's a bug - it was simply a bug closed as "won't fix". Basically no one wants to open up the "once-per-row behavior" can of worms and put strict definition to it.</description><pubDate>Sat, 30 Jan 2010 08:14:33 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]Paul White (1/30/2010)[/b][hr]Matt,I wouldn't take it too much to heart.  There is always a risk of misunderstanding arising when communication occurs in this medium.  Personally, I didn't find too much wrong with your first post, but that's just my impression.[/quote]Agreed.  Consider the following [i]miscommunication[/i] as an example:[quote]Matt Arpidone - "In the future, you could replace the query in the CROSS APPLY with a function like dbo.udf_latest_version_for_product(pv.ProductId) or something similar.  This drops the total lines down to 6 and gives you a reusable function to work with."[/quote][quote]Mike C - "In your article you might wish to discuss the internals of how CROSS APPLY works.  It might surprise you, and just might change your mind about using a UDF. "[/quote][quote]Paul - "For what it's worth, I admit I cannot find the UDF Mike refers to in your APPLY solution. :-)"[/quote]Hope that helps with the search for the UDF :)</description><pubDate>Sat, 30 Jan 2010 08:06:24 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]Matt Arpidone (1/30/2010)[/b][hr]Thanks Mike, I'll keep in mind that any criticism of articles or disagreement will be met with animosity.  I guess my opinion isn't welcome unless I agree with the author, which, as I can clearly see now, makes total sense on a site that welcomes "peer review".I do not know the author at all, nor have I read any of his other work, and I was very careful in my post not to resort to empty ad hominem attacks.  I did express concern about the content of the article, and if you would like to disagree with me on a particular point, feel free.Regarding the testing, I completely understand your concerns, although I find the aggravated tone of your post a little disturbing (maybe I'm reading it wrong, if so I apologize).The reason I did not look up the unit of measure is because units are irrelevant when discussing relative speed differences.  I would rather you not point out my deficiencies as a developer (of which there are many) unless they actually have relevance to my argument.  Certainly, my lack of knowledge about "client statistics" doesn't discredit anything I've posted (unless I've missed something), and furthermore doesn't impair my ability to do my job at all - which makes me curious why you insist that every one of your developers know about them.I am a little worried that I seemed harsh in my post.  I had previously typed up another post (in response to the author's post after mine) about how I thought his title was misleading and that caused me to misjudge the article - along with some additional disagreements - but then I decided to just drop it.  I do still stand by what I've already presented, but I recognize that I seem to have missed the true purpose of the article.Also I find it distasteful that you point out weaknesses in my testing and indicate that you have done your own improved testing, and then you fail to post any details about what you have done.  What do you mean when you say 25-30% faster when tested correctly?  Faster than what?  What do you mean "correctly"?  Personally, I feel that my tests were adequate to get my point across.  If you have numbers that disagree with mine, why don't you post them instead of giving nonconstructive retorts?[quote]In your article you might wish to discuss the internals of how CROSS APPLY works. It might surprise you, and just might change your mind about using a UDF. I'll leave it for you to research CROSS APPLY, as I don't want to pre-empt publication of what I expect to be an excellent article with your name on it in the near future.[/quote]What did you mean here exactly?  As I understand it, APPLY is similar to joining on a correlated subquery.  Actually, I confess that my love of APPLY comes from it being so similar to a foreach loop in a typical programming language.  I don't see what kind of crazy internal things could be going on that make it behave differently.  I did try doing some research on how it works internally, and while I didn't find much helpful material, I did come across this article:  [url]http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/[/url].  It explains a specific scenario where APPLY is more efficient than a query using ROW_NUMBER(), and it explains the minor pitfall with ROW_NUMBER().  I suspect something similar is happening with the product version query.Anyway, I'm not asking you to spoonfeed me a lecture all about how APPLY works, but a link to an article describing this surprising internal behavior would be appreciated since most of the stuff I'm finding gives basic overviews and is pretty much consistent with what I expected.[/quote]I was actually going through your post and responding point by point, then decided it wasn't worth it.  If all you got out of my response was "criticism of articles or disagreement will be met with animosity", then perhaps you should consider an alternative reading.  Maybe something along the lines of "animosity may be met with animosity"; or more accurately "someone might recommend I 'put my money where my mouth is'".  Despite you considering my response as some sort of attack on you, my tone was considerably nicer than yours.After you called the author's work garbage because he did not discuss what you consider the optimal alternative, you proceeded to make suggestions about employability for the same reason.  Nice.  Then you mentioned that you don't even understand the client statistics which formed the basis of your argument!  That's simply amazing.I merely suggested you take your issue with the author's work (and I assume you're focusing on the author's [i]work[/i] despite the fact that you made suggestions about the author in your post) and put it on the front page with an article of your own.  I then offered a few suggestions for possible topics your article could cover.I also mentioned that all of my employees are trained to understand client statistics (and other optimization tools) that are necessary to write efficient code and optimize it properly.  Whether writing and optimizing efficient code are requirements of your job or not is actually irrelevant to my statement, as it is a requirement on my job.  And anyone who doesn't know how to do it will get that look.  You know the one I'm talking about.Also, as I mentioned, before running tests it's somewhat helpful to understand the role of buffers and cache, and how they can skew results severely.  Oh, and here's some "relative number" results for you:[code]ROW_NUMBER         1.00CROSS APPLY          0.73[/code]As you can plainly see, these "relative number" results prove that CROSS APPLY ran 27% faster than ROW_NUMBER. Nice.  I won't even begin to address the kind of crazy results you can get on a poorly configured VM.If you're interested in learning more about CROSS APPLY get Itzik Ben-Gan's books - he gives a very good and detailed discussion of CROSS APPLY and OUTER APPLY.</description><pubDate>Sat, 30 Jan 2010 07:56:27 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>Matt,I wouldn't take it too much to heart.  There is always a risk of misunderstanding arising when communication occurs in this medium.  Personally, I didn't find too much wrong with your first post, but that's just my impression.For what it's worth, I admit I cannot find the UDF Mike refers to in your APPLY solution. :-)Paul</description><pubDate>Sat, 30 Jan 2010 04:35:04 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>Thanks Mike, I'll keep in mind that any criticism of articles or disagreement will be met with animosity.  I guess my opinion isn't welcome unless I agree with the author, which, as I can clearly see now, makes total sense on a site that welcomes "peer review".I do not know the author at all, nor have I read any of his other work, and I was very careful in my post not to resort to empty ad hominem attacks.  I did express concern about the content of the article, and if you would like to disagree with me on a particular point, feel free.Regarding the testing, I completely understand your concerns, although I find the aggravated tone of your post a little disturbing (maybe I'm reading it wrong, if so I apologize).The reason I did not look up the unit of measure is because units are irrelevant when discussing relative speed differences.  I would rather you not point out my deficiencies as a developer (of which there are many) unless they actually have relevance to my argument.  Certainly, my lack of knowledge about "client statistics" doesn't discredit anything I've posted (unless I've missed something), and furthermore doesn't impair my ability to do my job at all - which makes me curious why you insist that every one of your developers know about them.I am a little worried that I seemed harsh in my post.  I had previously typed up another post (in response to the author's post after mine) about how I thought his title was misleading and that caused me to misjudge the article - along with some additional disagreements - but then I decided to just drop it.  I do still stand by what I've already presented, but I recognize that I seem to have missed the true purpose of the article.Also I find it distasteful that you point out weaknesses in my testing and indicate that you have done your own improved testing, and then you fail to post any details about what you have done.  What do you mean when you say 25-30% faster when tested correctly?  Faster than what?  What do you mean "correctly"?  Personally, I feel that my tests were adequate to get my point across.  If you have numbers that disagree with mine, why don't you post them instead of giving nonconstructive retorts?[quote]In your article you might wish to discuss the internals of how CROSS APPLY works. It might surprise you, and just might change your mind about using a UDF. I'll leave it for you to research CROSS APPLY, as I don't want to pre-empt publication of what I expect to be an excellent article with your name on it in the near future.[/quote]What did you mean here exactly?  As I understand it, APPLY is similar to joining on a correlated subquery.  Actually, I confess that my love of APPLY comes from it being so similar to a foreach loop in a typical programming language.  I don't see what kind of crazy internal things could be going on that make it behave differently.  I did try doing some research on how it works internally, and while I didn't find much helpful material, I did come across this article:  [url]http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/[/url].  It explains a specific scenario where APPLY is more efficient than a query using ROW_NUMBER(), and it explains the minor pitfall with ROW_NUMBER().  I suspect something similar is happening with the product version query.Anyway, I'm not asking you to spoonfeed me a lecture all about how APPLY works, but a link to an article describing this surprising internal behavior would be appreciated since most of the stuff I'm finding gives basic overviews and is pretty much consistent with what I expected.</description><pubDate>Sat, 30 Jan 2010 02:17:16 GMT</pubDate><dc:creator>Matt Arpidone</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]Mike C (1/29/2010)[/b][hr]What Tony is referring to is the fact that the CTE regenerates the GUID every time the "rn" column is referenced from the CTE.  This is a bug.  The GUID should be generated every time NEWID is referenced.  NEWID is only directly referenced once (in the CTE, by the ROW_NUMBER function).  Run Tony's example and look at the two "rn" columns.  They should match up with one another in every row.[/quote]Yes I know - and I disagree, and previously explained why.  Read Microsoft's replies to the Connect item entered by Itzik Ben-Gan for a fuller explanation of why this is not a bug - it's a sensible design decision.</description><pubDate>Fri, 29 Jan 2010 23:01:40 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]Paul White (1/29/2010)[/b][hr][quote][b]RBarryYoung (5/12/2009)[/b][hr][quote][b]tony rogerson (5/12/2009)[/b][hr]Using non-deterministic functions like ROW_NUMBER and NEWID need to be done carefully - my blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/06/12/bug-in-inline-expansion-of-non-deterministic-functions-in-derived-tables-and-cte-s-causes-incorrect-results.aspx explains the serious bug that exists in SQL Server 2005 with using them.[/quote]Afraid that I do not understand this.  AFAIK, CTEs are table expressions like non-indexed views.  If you write them non-deterministically (which is what you do in your blog article), then they return non-deterministic results.  And if you reference a non-deterministic table expression multiple times, you are not guaranteed to get the same results.  That's straight ANSI SQL to the best of my knowledge and I see no bug in Transact-SQL demonstrated here.And as far as ROW_NUMBER() it only returns non-deterministic results if the OVER clause parameters are non-deterministic.  If you use a deterministic ORDER BY (which also requires that the column set is uniquely orderable), then you will get deterministic results.  All this really demonstrates is that if you input non-deterministic expressions without instantiating them, then you will get non-deterministic results back.  Garbage In, Garbage Out.[/quote]Couldn't agree more.  A regular CTE is nothing more than a more flexible and better-laid-out derived table.  Just because you can reference a side-effecting function like NEWID() in a CTE, reference the CTE twice and get two different results from each reference doesn't make it a bug!  If you try to write the equivalent query using derived tables, you'll have to duplicate the code, making the coding error clear to see.It's no more surprising than the fact that the following poorly-written code (attempting to produce a random integer from 1 to 3) produces NULL quite frequently:[code]SELECT CASE FLOOR(RAND(CHECKSUM(NEWID())) * 3 + 1) WHEN 1 THEN 'One' WHEN 2 THEN 'Two' WHEN 3 THEN 'Three' ELSE NULL END;[/code]SQL Server does not guarantee that scalar functions will be called a certain number of times, or in any particular order.  It is up to the programmer to avoid potentially problematic use of such side-effecting functions.Paul[/quote]What Tony is referring to is the fact that the CTE regenerates the GUID every time the "rn" column is referenced from the CTE.  This is a bug.  The GUID should be generated every time NEWID is referenced.  NEWID is only directly referenced once (in the CTE, by the ROW_NUMBER function).  Run Tony's example and look at the two "rn" columns.  They should match up with one another in every row.</description><pubDate>Fri, 29 Jan 2010 22:20:49 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]Matt Arpidone (1/29/2010)[/b][hr]DISCLAIMER:  I do not consider myself to be a "DBA", and I don't have any sort of certification.I'm going to have to disagree with most of you that this was a great article.  While ROW_NUMBER() certainly has its uses, the example given in the article seems contrived and is anything but efficient in my testing.First off, the first query provided was a joke.  If anyone wrote a monster like that I would be giving them that look.  You know what look I'm talking about, the "Seriously?  Who hired you?" look.  In fact, I can't believe the author took the time to write something like that for the article.  Essentially I'm saying that the comparison in the article is unfair, the author is comparing a crappy query to an even crappier query.[/quote]I know Steve is always looking for smart, talented members of the community who have found a better way to do x, y, or z to submit articles to him for publication.  It's very easy to criticize; it's a bit harder to put your own work up for peer review and harsh criticism from the Matt Arpidones of the world.  While this article dealt with how to simplify/flatten out multiple levels of subqueries into a format that's easier to read and understand, and proves to be more efficient than multiple nested subqueries (which is where many people first go when they're developing this kind of application), I'd recommend publishing an article if you have a better way.[quote]I used a similar approach to generate a random ProductVersion table with 50 versions for each of 1,000,000 products (50 million rows).  I then compared his query to my own using SSMS's built-in "client statistics" using the average values obtained from running the query 3 times (I could have done more, but as you will see later, I was getting aggravated waiting for his query to finish).  I directed the output to a file each time, and when I was finished I compared the contents of the files using the commandline tools "sort" and "fc" to ensure they were giving the same results.Here is my version of the query:[code="sql"]SELECT	x.*FROM (	SELECT	DISTINCT ProductId	FROM	dbo.ProductVersion) pv	CROSS APPLY (		SELECT	TOP(1) *		FROM	dbo.ProductVersion		WHERE	ProductId = pv.ProductId		ORDER BY Version DESC, MinorVersion DESC, ReleaseVersion DESC	) x[/code]First off, I find this MUCH easier to read.  It's less than half the size and just as easy to modify, but the biggest reason I like this query more is because if you paraphrase what the query is doing, you sound like you are describing the problem.  "Grab the ProductId's out of the ProductVersion table, and for each ProductId get the row with the most recent version."  In the future, you could replace the query in the CROSS APPLY with a function like dbo.udf_latest_version_for_product(pv.ProductId) or something similar.  This drops the total lines down to 6 and gives you a reusable function to work with.[/quote]In your article you might wish to discuss the internals of how CROSS APPLY works.  It might surprise you, and just might change your mind about using a UDF.  I'll leave it for you to research CROSS APPLY, as I don't want to pre-empt publication of what I expect to be an excellent article with your name on it in the near future.[quote]And now, most importantly, here are the execution results I got running these queries on my 50 million row table.  This was run on my work laptop (a Core 2 Duo with your typical not-spectacular HDD and 4GB of ram) on a local SQL Server 2008 database.[code="plain"]The Author's ROW_NUMBER() query:                             Trial 3   Trial 2   Trial 1   AverageClient processing time       233120    241950    241094    238721.3000Total execution time         288418    359298    282119    309945.0000Wait time on server replies  55298     117348    41025     71223.6600My CROSS APPLY query:                             Trial 3   Trial 2   Trial 1   AverageClient processing time       2075      2246      2379      2233.3330Total execution time         8546      8796      14732     10691.3300Wait time on server replies  6471      6550      12353     8458.0000[/code]I do not know the units that SSMS uses for those measurements or what exactly each metric means.  I looked mostly at "Wait time on server replies" as it sounds like a good indicator of how long it took the database engine to process the query.  As far as results, I think its enough to notice that there's almost an order of magnitude difference.  I don't think I could possibly spin this in any way that would make the author's query seem "efficient".  On average I waited about 15 seconds for my query to complete, whereas I waited about 5 minutes or more for the author's query.  I did NOT try to tweak his query in any way.  This post is only meant to point out the weaknesses in the article and show people a valid alternative.[/quote]Interesting.  I expect all the developers I work with to know and understand the client statistics that SQL Server generates.  If one of them did not know the unit of measure or meaning of the client statistics, I would look at them like "Who hired you?"  If you haven't already you might want to look into it; it is documented well in BOL and in several articles on the Web.BTW I ran tests similar to yours on a SQL Server 2008 (x64, dual core AMD 2.9 GHz, 8 GB, 3 TB HDD) and found your method to be about 25 - 30% faster when tested correctly.  You can easily get invalid statistics if you don't clear the buffers and cache before you execute multiple queries.  As a case in point, I ran your query and then ran the author's query against my sample data without clearing the buffers and cache in between runs and found that the author's query responded about 5% faster than yours.[quote]Why did I use a 50 million row test set?  C'mon people, everyone knows that if you use a small enough test set everything seems efficient.  When I'm dealing with a table that has a couple thousand rows in it, and I know this table won't get much larger, I focus all my energy on writing clear, concise, and easy to maintain queries.  I only address efficiency if it somehow becomes a problem, although it rarely does.Again I will point out my disclaimer above and my tests were hardly scientific, but when there's such a large difference, it doesn't seem to matter much.[/quote]"Hardly scientific" puts quite a light touch on it after coming out swinging at the article author like the 'undisputed champion'.  While the author did not address the CROSS APPLY method in this article, he did a good job of explaining how to simplify a specific class of problematic queries using ranking and windowing functions.  There are other interesting tasks that can be accomplished with ranking and windowing functions that this particular article did not address (and hopefully a lot more once MS adds the additional ranking/windowing functionality that's been requested by the community).  I'm personally looking forward to more articles on it, and I also look forward to improved performance from Microsoft in CTEs in the future.</description><pubDate>Fri, 29 Jan 2010 21:59:00 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>I gave it a shot in a VM we have at work running SQL Server 2005.  I'm a little unclear on the exact hardware its running on, but I believe it's a dual core Xeon with 3GB ram allocated to the VM.  I expected the results to be a little slower and slightly more erratic since there are other VM's running on the same machine, although at the time I performed these tests I would expect the machine load to be very low.Here's the data:[code="plain"]ROW_NUMBER:                              Trial 5  Trial 4  Trial 3  Trial 2  Trial 1  AverageClient processing time        4479     5322     5042     5011     4791     4929.0000Total execution time          279921   281011   282715   273348   288659   281130.8000Wait time on server replies   275442   275689   277673   268337   283868   276201.8000CROSS APPLY:                              Trial 5  Trial 4  Trial 3  Trial 2  Trial 1  AverageClient processing time        64775    104076   68224    101766   111740   90116.2000Total execution time          117143   147420   111318   152149   166852   138976.4000Wait time on server replies   52368    43344    43094    50383    55112    48860.2000[/code]According to these numbers the CROSS APPLY query is many times faster than the ROW_NUMBER query, however each time I ran the CROSS APPLY it took around 2:20 and the ROW_NUMBER query took around 5:00, so it was only about twice as fast.  I'm not sure what the discrepancy is there.I guess optimizations have been added in SQL Server 2008 that make APPLY's an even more attractive option.</description><pubDate>Fri, 29 Jan 2010 20:28:31 GMT</pubDate><dc:creator>Matt Arpidone</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>The reason I had the index sorting in the opposite direction was to demonstrate the structure of table.  The query itself represents a requirement that was needed and the structure of the table of might not necessarily be able to be modified.  Sorting the index in the order of the requirement will definitely give you a boost in performance.As for the ORDER BY clause, you don't need the partitioning column in there.  I included it just to demonstrate how the row numbers will be assigned.[quote][b]Ted Pin (1/29/2010)[/b][hr]Great article!! We use this exact technique in our ETL and it's great to see such solid confirmation that we have chosen a good strategy.Two comments.1. Why did you include ProductID in your ORDER BY clause?[code="other"]OVER(PARTITION BY ProductID                     ORDER BY ProductID,                        Version DESC,[/code]It doesn't seem necessary for determining the latest version?2. I have garnered about a 60% decrease in cost by sorting the index in the same direction as the ORDER BY clauses; in this case, DESC. Could you try that for us and report back on your results?Thanks a lot![/quote]</description><pubDate>Fri, 29 Jan 2010 14:30:07 GMT</pubDate><dc:creator>Francis Rodrigues-459442</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>First, please note the subtitle of the article, An Efficient Alternative to Subqueries.  There were no claims about it being the best, just an alternative to subqueries.  From the results you discovered, it looks like you have found an alternative to ROW_NUMBER().  The purpose of this article was to demonstrate that there are different ways to achieve the same end.--editAlso, the code was written on SQL Server 2005 on a laptop with 2gb of ram and Core 2 cpu.  Can you try your alternative on a 2005 instance and report the results?[quote][b]Matt Arpidone (1/29/2010)[/b][hr]DISCLAIMER:  I do not consider myself to be a "DBA", and I don't have any sort of certification.I'm going to have to disagree with most of you that this was a great article.  While ROW_NUMBER() certainly has its uses, the example given in the article seems contrived and is anything but efficient in my testing.First off, the first query provided was a joke.  If anyone wrote a monster like that I would be giving them that look.  You know what look I'm talking about, the "Seriously?  Who hired you?" look.  In fact, I can't believe the author took the time to write something like that for the article.  Essentially I'm saying that the comparison in the article is unfair, the author is comparing a crappy query to an even crappier query.I used a similar approach to generate a random ProductVersion table with 50 versions for each of 1,000,000 products (50 million rows).  I then compared his query to my own using SSMS's built-in "client statistics" using the average values obtained from running the query 3 times (I could have done more, but as you will see later, I was getting aggravated waiting for his query to finish).  I directed the output to a file each time, and when I was finished I compared the contents of the files using the commandline tools "sort" and "fc" to ensure they were giving the same results.Here is my version of the query:[code="sql"]SELECT	x.*FROM (	SELECT	DISTINCT ProductId	FROM	dbo.ProductVersion) pv	CROSS APPLY (		SELECT	TOP(1) *		FROM	dbo.ProductVersion		WHERE	ProductId = pv.ProductId		ORDER BY Version DESC, MinorVersion DESC, ReleaseVersion DESC	) x[/code]First off, I find this MUCH easier to read.  It's less than half the size and just as easy to modify, but the biggest reason I like this query more is because if you paraphrase what the query is doing, you sound like you are describing the problem.  "Grab the ProductId's out of the ProductVersion table, and for each ProductId get the row with the most recent version."  In the future, you could replace the query in the CROSS APPLY with a function like dbo.udf_latest_version_for_product(pv.ProductId) or something similar.  This drops the total lines down to 6 and gives you a reusable function to work with.And now, most importantly, here are the execution results I got running these queries on my 50 million row table.  This was run on my work laptop (a Core 2 Duo with your typical not-spectacular HDD and 4GB of ram) on a local SQL Server 2008 database.[code="plain"]The Author's ROW_NUMBER() query:                             Trial 3   Trial 2   Trial 1   AverageClient processing time       233120    241950    241094    238721.3000Total execution time         288418    359298    282119    309945.0000Wait time on server replies  55298     117348    41025     71223.6600My CROSS APPLY query:                             Trial 3   Trial 2   Trial 1   AverageClient processing time       2075      2246      2379      2233.3330Total execution time         8546      8796      14732     10691.3300Wait time on server replies  6471      6550      12353     8458.0000[/code]I do not know the units that SSMS uses for those measurements or what exactly each metric means.  I looked mostly at "Wait time on server replies" as it sounds like a good indicator of how long it took the database engine to process the query.  As far as results, I think its enough to notice that there's almost an order of magnitude difference.  I don't think I could possibly spin this in any way that would make the author's query seem "efficient".  On average I waited about 15 seconds for my query to complete, whereas I waited about 5 minutes or more for the author's query.  I did NOT try to tweak his query in any way.  This post is only meant to point out the weaknesses in the article and show people a valid alternative.Why did I use a 50 million row test set?  C'mon people, everyone knows that if you use a small enough test set everything seems efficient.  When I'm dealing with a table that has a couple thousand rows in it, and I know this table won't get much larger, I focus all my energy on writing clear, concise, and easy to maintain queries.  I only address efficiency if it somehow becomes a problem, although it rarely does.Again I will point out my disclaimer above and my tests were hardly scientific, but when there's such a large difference, it doesn't seem to matter much.[/quote]</description><pubDate>Fri, 29 Jan 2010 14:23:10 GMT</pubDate><dc:creator>Francis Rodrigues-459442</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>DISCLAIMER:  I do not consider myself to be a "DBA", and I don't have any sort of certification.I'm going to have to disagree with most of you that this was a great article.  While ROW_NUMBER() certainly has its uses, the example given in the article seems contrived and is anything but efficient in my testing.First off, the first query provided was a joke.  If anyone wrote a monster like that I would be giving them that look.  You know what look I'm talking about, the "Seriously?  Who hired you?" look.  In fact, I can't believe the author took the time to write something like that for the article.  Essentially I'm saying that the comparison in the article is unfair, the author is comparing a crappy query to an even crappier query.I used a similar approach to generate a random ProductVersion table with 50 versions for each of 1,000,000 products (50 million rows).  I then compared his query to my own using SSMS's built-in "client statistics" using the average values obtained from running the query 3 times (I could have done more, but as you will see later, I was getting aggravated waiting for his query to finish).  I directed the output to a file each time, and when I was finished I compared the contents of the files using the commandline tools "sort" and "fc" to ensure they were giving the same results.Here is my version of the query:[code="sql"]SELECT	x.*FROM (	SELECT	DISTINCT ProductId	FROM	dbo.ProductVersion) pv	CROSS APPLY (		SELECT	TOP(1) *		FROM	dbo.ProductVersion		WHERE	ProductId = pv.ProductId		ORDER BY Version DESC, MinorVersion DESC, ReleaseVersion DESC	) x[/code]First off, I find this MUCH easier to read.  It's less than half the size and just as easy to modify, but the biggest reason I like this query more is because if you paraphrase what the query is doing, you sound like you are describing the problem.  "Grab the ProductId's out of the ProductVersion table, and for each ProductId get the row with the most recent version."  In the future, you could replace the query in the CROSS APPLY with a function like dbo.udf_latest_version_for_product(pv.ProductId) or something similar.  This drops the total lines down to 6 and gives you a reusable function to work with.And now, most importantly, here are the execution results I got running these queries on my 50 million row table.  This was run on my work laptop (a Core 2 Duo with your typical not-spectacular HDD and 4GB of ram) on a local SQL Server 2008 database.[code="plain"]The Author's ROW_NUMBER() query:                             Trial 3   Trial 2   Trial 1   AverageClient processing time       233120    241950    241094    238721.3000Total execution time         288418    359298    282119    309945.0000Wait time on server replies  55298     117348    41025     71223.6600My CROSS APPLY query:                             Trial 3   Trial 2   Trial 1   AverageClient processing time       2075      2246      2379      2233.3330Total execution time         8546      8796      14732     10691.3300Wait time on server replies  6471      6550      12353     8458.0000[/code]I do not know the units that SSMS uses for those measurements or what exactly each metric means.  I looked mostly at "Wait time on server replies" as it sounds like a good indicator of how long it took the database engine to process the query.  As far as results, I think its enough to notice that there's almost an order of magnitude difference.  I don't think I could possibly spin this in any way that would make the author's query seem "efficient".  On average I waited about 15 seconds for my query to complete, whereas I waited about 5 minutes or more for the author's query.  I did NOT try to tweak his query in any way.  This post is only meant to point out the weaknesses in the article and show people a valid alternative.Why did I use a 50 million row test set?  C'mon people, everyone knows that if you use a small enough test set everything seems efficient.  When I'm dealing with a table that has a couple thousand rows in it, and I know this table won't get much larger, I focus all my energy on writing clear, concise, and easy to maintain queries.  I only address efficiency if it somehow becomes a problem, although it rarely does.Again I will point out my disclaimer above and my tests were hardly scientific, but when there's such a large difference, it doesn't seem to matter much.</description><pubDate>Fri, 29 Jan 2010 13:04:04 GMT</pubDate><dc:creator>Matt Arpidone</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>Great article!! We use this exact technique in our ETL and it's great to see such solid confirmation that we have chosen a good strategy.Two comments.1. Why did you include ProductID in your ORDER BY clause?[code="other"]OVER(PARTITION BY ProductID                     ORDER BY ProductID,                        Version DESC,[/code]It doesn't seem necessary for determining the latest version?2. I have garnered about a 60% decrease in cost by sorting the index in the same direction as the ORDER BY clauses; in this case, DESC. Could you try that for us and report back on your results?Thanks a lot!</description><pubDate>Fri, 29 Jan 2010 05:45:52 GMT</pubDate><dc:creator>Ted Manasa</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]RBarryYoung (5/12/2009)[/b][hr][quote][b]tony rogerson (5/12/2009)[/b][hr]Using non-deterministic functions like ROW_NUMBER and NEWID need to be done carefully - my blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/06/12/bug-in-inline-expansion-of-non-deterministic-functions-in-derived-tables-and-cte-s-causes-incorrect-results.aspx explains the serious bug that exists in SQL Server 2005 with using them.[/quote]Afraid that I do not understand this.  AFAIK, CTEs are table expressions like non-indexed views.  If you write them non-deterministically (which is what you do in your blog article), then they return non-deterministic results.  And if you reference a non-deterministic table expression multiple times, you are not guaranteed to get the same results.  That's straight ANSI SQL to the best of my knowledge and I see no bug in Transact-SQL demonstrated here.And as far as ROW_NUMBER() it only returns non-deterministic results if the OVER clause parameters are non-deterministic.  If you use a deterministic ORDER BY (which also requires that the column set is uniquely orderable), then you will get deterministic results.  All this really demonstrates is that if you input non-deterministic expressions without instantiating them, then you will get non-deterministic results back.  Garbage In, Garbage Out.[/quote]Couldn't agree more.  A regular CTE is nothing more than a more flexible and better-laid-out derived table.  Just because you can reference a side-effecting function like NEWID() in a CTE, reference the CTE twice and get two different results from each reference doesn't make it a bug!  If you try to write the equivalent query using derived tables, you'll have to duplicate the code, making the coding error clear to see.It's no more surprising than the fact that the following poorly-written code (attempting to produce a random integer from 1 to 3) produces NULL quite frequently:[code]SELECT CASE FLOOR(RAND(CHECKSUM(NEWID())) * 3 + 1) WHEN 1 THEN 'One' WHEN 2 THEN 'Two' WHEN 3 THEN 'Three' ELSE NULL END;[/code]SQL Server does not guarantee that scalar functions will be called a certain number of times, or in any particular order.  It is up to the programmer to avoid potentially problematic use of such side-effecting functions.Paul</description><pubDate>Fri, 29 Jan 2010 05:40:14 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>A few alternatives (just for fun)...[code="sql"]--row_number() version  [generally the preferred approach, I would imagine]; with t1 as (    select *, row_number() over (partition by ProductID order by Version desc, MinorVersion desc, ReleaseVersion desc) as Row from Production.ProductVersion2)select * from t1 where Row = 1 order by ProductID--/--without row_number()  [assuming all version numbers &amp;lt; 10000000]; with  t1 as (select *, Version * 10000000000000 + MinorVersion * 1000000 + ReleaseVersion as FullVersion         from Production.ProductVersion2), t2 as (select ProductID, max(FullVersion) as MaxFullVersion from t1 group by ProductID)select * from t1 where exists (select * from t2 where FullVersion = MaxFullVersion)order by ProductID--/--Equivalent for SQL 2000  [assuming all version numbers &amp;lt; 10000000]select * from    (select *, Version * 10000000000000 + MinorVersion * 1000000 + ReleaseVersion as FullVersion     from Production.ProductVersion2) awhere exists (select * from (select ProductID, max(FullVersion) as MaxFullVersion                             from (select *, Version * 10000000000000 + MinorVersion * 1000000 + ReleaseVersion as FullVersion                                   from Production.ProductVersion2) b                             group by ProductID) c              where FullVersion = MaxFullVersion)order by ProductID--/--Alternative for SQL 2000  [assuming all version numbers &amp;lt; 10000000]select * from Production.ProductVersion2 awhere Version * 10000000000000 + MinorVersion * 1000000 + ReleaseVersion =     (select top 1 Version * 10000000000000 + MinorVersion * 1000000 + ReleaseVersion as FullVersion     from Production.ProductVersion2 where ProductID = a.ProductID     order by Version desc, MinorVersion desc, ReleaseVersion desc)order by ProductID--/[/code]</description><pubDate>Fri, 29 Jan 2010 03:47:40 GMT</pubDate><dc:creator>RyanRandall</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>Thanks for comparing the window function to the subquery.  Nice article.</description><pubDate>Wed, 27 Jan 2010 01:27:51 GMT</pubDate><dc:creator>SQLRNNR</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>Just another great article.  I've recently discovered Window functions in SQL Server 2005 and they certainly eliminate some of the overhead caused by subqueries and allow me to get data out easier and quicker. :-)</description><pubDate>Wed, 27 Jan 2010 01:13:45 GMT</pubDate><dc:creator>grxcls</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>I had come to this conclusion independently.  It is nice to see collaboration.One difference is that in my test case, the ROW_NUMBER was always faster even when properly indexed.  One other thing I noticed; my subquery returns ties, whereas ROW_NUMBER does not.  In order to return ties (two or more emp_id with the same comm), I used RANK() instead of ROW_NUMBER.The RANK() solution took about twice as long as the ROW_NUMBER() solution, but returned the same results as the correlated subquery.  It was still a faster solution, and the IO was much less -- create clustered index cicomm on comm(emp_id,comm desc)set statistics time onset statistics io on DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS   -- clear all data cacheDBCC FREEPROCCACHE WITH NO_INFOMSGS     -- clear stored procedure cacheprint '********************CORRELATED SUBQUERY'select * from comm awhere a.comm in (select top 3 comm from comm cq		where cq.emp_id = a.emp_id		order by comm desc ) order by emp_id, comm desc;DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS   -- clear all data cacheDBCC FREEPROCCACHE WITH NO_INFOMSGS     -- clear stored procedure cacheprint '********************ROW_NUMBER'select * from (select *, ROW_NUMBER() over(partition by emp_id order by emp_id, comm desc ) rankfrom comm) as a where rank &lt;= 3DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS   -- clear all data cacheDBCC FREEPROCCACHE WITH NO_INFOMSGS     -- clear stored procedure cacheprint '********************RANK'select * from (select *, RANK() over(partition by emp_id order by emp_id, comm desc ) rankfrom comm) as a where rank &lt;= 3Excerpt from messages:********************CORRELATED SUBQUERY(29999 row(s) affected)Table 'comm'. Scan count 430135, logical reads 1375435, physical reads 4, read-ahead reads 1119, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:   CPU time = 983 ms,  elapsed time = 1748 ms.********************ROW_NUMBER(29997 row(s) affected)Table 'comm'. Scan count 1, logical reads 1123, physical reads 4, read-ahead reads 1119, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times:   CPU time = 156 ms,  elapsed time = 381 ms.********************RANK(29999 row(s) affected)Table 'comm'. Scan count 1, logical reads 1123, physical reads 3, read-ahead reads 1119, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.(1 row(s) affected) SQL Server Execution Times:   CPU time = 343 ms,  elapsed time = 567 ms.</description><pubDate>Wed, 24 Jun 2009 16:36:21 GMT</pubDate><dc:creator>bob amy</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]Pritesh Keniya (5/21/2009)[/b][hr]Thanks for introducing the concept of Row_Number(). A good example to quickly understand the concept.Just out of curiosity I thought: "What if we use CTE?"------------------------------------------------------------------------------------------------With MaxVer as (	SELECT ProductID, MAX(Version) AS Version	from Production.ProductVersion WITH (NOLOCK)	Group by ProductID),MaxMinorVer as (	Select PV.ProductID, PV.Version, MAX(MinorVersion) as MinorVersion 	From Production.ProductVersion PV WITH (NOLOCK) 	Inner Join MaxVer MV on MV.ProductID = PV.ProductID and MV.Version = PV.Version	Group by PV.ProductID, PV.Version),MaxReleaseVersion as (	Select PV.ProductID, PV.Version, PV.MinorVersion, MAX(ReleaseVersion) as ReleaseVersion 	From Production.ProductVersion PV WITH (NOLOCK) 	Inner Join MaxMinorVer MV on MV.ProductID = PV.ProductID and MV.Version = PV.Version and MV.MinorVersion = PV.MinorVersion 	Group by PV.ProductID, PV.Version, PV.MinorVersion )Select PV.* from Production.ProductVersion PVInner Join MaxReleaseVersion MV WITH (NOLOCK) on MV.ProductID = PV.ProductID and MV.Version = PV.Version 													and MV.MinorVersion = PV.MinorVersion and MV.ReleaseVersion = PV.ReleaseVersionorder by PV.ProductID-------------------------------------------------------------------------------------------------And I found out: Following is the subtree cost (CTE vs Row_Number()) for 100000 rows:CTE:                0.670628Row_Number():  5.97198Just an other approach to the example.:)[/quote]This approach is fine.  The estimate even indicates that it is faster.  One of my main goals was to show that if changes needed to be made to the code, ROW_NUMBER() offers one place for the change to be made rather than cascading through different sections.  Ultimately it is up to the programmers to decide what suits their needs the best.</description><pubDate>Thu, 21 May 2009 08:11:38 GMT</pubDate><dc:creator>Francis Rodrigues-459442</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>Thanks for introducing the concept of Row_Number(). A good example to quickly understand the concept.Just out of curiosity I thought: "What if we use CTE?"------------------------------------------------------------------------------------------------With MaxVer as (	SELECT ProductID, MAX(Version) AS Version	from Production.ProductVersion WITH (NOLOCK)	Group by ProductID),MaxMinorVer as (	Select PV.ProductID, PV.Version, MAX(MinorVersion) as MinorVersion 	From Production.ProductVersion PV WITH (NOLOCK) 	Inner Join MaxVer MV on MV.ProductID = PV.ProductID and MV.Version = PV.Version	Group by PV.ProductID, PV.Version),MaxReleaseVersion as (	Select PV.ProductID, PV.Version, PV.MinorVersion, MAX(ReleaseVersion) as ReleaseVersion 	From Production.ProductVersion PV WITH (NOLOCK) 	Inner Join MaxMinorVer MV on MV.ProductID = PV.ProductID and MV.Version = PV.Version and MV.MinorVersion = PV.MinorVersion 	Group by PV.ProductID, PV.Version, PV.MinorVersion )Select PV.* from Production.ProductVersion PVInner Join MaxReleaseVersion MV WITH (NOLOCK) on MV.ProductID = PV.ProductID and MV.Version = PV.Version 													and MV.MinorVersion = PV.MinorVersion and MV.ReleaseVersion = PV.ReleaseVersionorder by PV.ProductID-------------------------------------------------------------------------------------------------And I found out: Following is the subtree cost (CTE vs Row_Number()) for 100000 rows:CTE:                0.670628Row_Number():  5.97198Just an other approach to the example.:)</description><pubDate>Thu, 21 May 2009 06:19:16 GMT</pubDate><dc:creator>Pritesh Keniya</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]wbrianwhite (5/13/2009)[/b][hr][quote][b]Mike C (5/13/2009)[/b][hr]There's always more than one way to skin a cat.  The question is twofold:  (1) What's the performance of the alternative solution and (2) is the alternative solution easier to manage?  I think the demonstration here was that the performance and manageability are both improved with CTEs and ranking/windowing functions.[/quote]But the performance of the subquery approach was actually 100% - 500% BETTER than the performance of the ROW_NUMBER() approach, when the table in question was properly indexed (and tables should always be properly indexed).  From Figure 4:[quote]Rows     - Sub     -   Row_Number()100000  - 2.2258 -   5.971981000000 -14.3881 -  83.7228[/quote]Ignoring a 100% - 500% performance improvement to use a nifty new trick is what I would consider suicidal.[/quote]When you say "properly indexed" you're referring to the clustered index on the versioning columns, correct?  I'm trying to think of situations where I would use my single clustered index per table on the versioning columns.  I think a more realistic example would be no index on the versioning columns, or nonclustered index on them in which case you might have to contend with RID lookups which can be even more expensive (unless you create covering indexes of course).</description><pubDate>Wed, 13 May 2009 22:11:21 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]Mike C (5/13/2009)[/b][hr]There's always more than one way to skin a cat.  The question is twofold:  (1) What's the performance of the alternative solution and (2) is the alternative solution easier to manage?  I think the demonstration here was that the performance and manageability are both improved with CTEs and ranking/windowing functions.[/quote]But the performance of the subquery approach was actually 100% - 500% BETTER than the performance of the ROW_NUMBER() approach, when the table in question was properly indexed (and tables should always be properly indexed).  From Figure 4:[quote]Rows     - Sub     -   Row_Number()100000  - 2.2258 -   5.971981000000 -14.3881 -  83.7228[/quote]Ignoring a 100% - 500% performance improvement to use a nifty new trick is what I would consider suicidal.</description><pubDate>Wed, 13 May 2009 14:53:46 GMT</pubDate><dc:creator>wbrianwhite</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]wbrianwhite (5/12/2009)[/b][hr]In what way was simple set logic unable to perform this operation?  It looks like this could be solved with left outer joins and is null tests.  I don't have the test db mentioned, so my &gt; may be wrong, it could need to be a &lt;, I can never remember without testing it.   But something like this should do the trick[/quote]There's always more than one way to skin a cat.  The question is twofold:  (1) What's the performance of the alternative solution and (2) is the alternative solution easier to manage?  I think the demonstration here was that the performance and manageability are both improved with CTEs and ranking/windowing functions.</description><pubDate>Wed, 13 May 2009 12:50:47 GMT</pubDate><dc:creator>Mike C</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>Well, exactly. I wouldn't compare this to subqueries, I'd compare it to using top and a sort. Is it faster than that? I'd expect them to be about the same -- I don't see what Row_Number() gets us.</description><pubDate>Wed, 13 May 2009 09:30:56 GMT</pubDate><dc:creator>Ion Freeman</dc:creator></item><item><title>RE: ROW_NUMBER(): An Efficient Alternative to Subqueries</title><link>http://www.sqlservercentral.com/Forums/Topic714684-1545-1.aspx</link><description>[quote][b]wbrianwhite (5/12/2009)[/b][hr][quote][b]William Hutton (5/12/2009)[/b][hr]I think it's just a matter of ...elegance? To me using Row_Number() statements (and CTEs in general) are easier to write and easier to follow in specific situations (provided the one reading knows what they do). And I can tell you just from testing this morning that the Row_Number() query has a performance improvement over the left joins and isnull tests I had been using previously.[/quote]Do they?  How big is it?  Is that before or after you make a clustered index on the four significant columns? In figure 4 it looks like a well-indexed table is faster when queried by the sub-query approach.  And since this table is going to be used in lots and lots of joins besides the one in the article, it needs to be well-indexed anyway. The comparison of query plans in the article is skewed because the 14 nested subqueries can be replaced by a single select with 4 left outer joins, and it is that query plan comparison I'd be interested in seeing. Also, isn't that subquery getting the max release version and standard cost as well, while the Row_Number() version is not?[/quote]The MAX release version is inherently pulled by the ORDER BY ... DESC.  The standard cost that is selected is the one that is associated with that particular minor version and release version for a product.</description><pubDate>Tue, 12 May 2009 21:13:01 GMT</pubDate><dc:creator>Francis Rodrigues-459442</dc:creator></item></channel></rss>