What is faster, MAX() with GROUP BY, OR CTE with ROW_NUMBER()?

Question

What is faster, MAX() with GROUP BY, OR CTE with ROW_NUMBER()?

Pagan DBA

Right there with Babe

Points: 794
More actions
August 1, 2011 at 3:23 pm

#245287

I have to manage various kinds of "data" and maintain "versions" of that data. To simplify the problem I have created a dummy data table and a dummy version table that has foreign key to the data id. I could just type the rest in English but the SQL code speaks for itself. My questions is at the bottom in SQL comments. Need to figure out which is faster because I'm getting sporadic results. Sometimes one is faster and sometimes the other.
One can simply copy / paste the SQL and run it
Thanks in advance.
---------------------------------
CREATE TABLE SomeData (
SomeDataIdBIGINT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
SomeTextVARCHAR(64),
CreateDtDATETIME NOT NULL DEFAULT (GETDATE()),
)
CREATE TABLE SomeDataVersion (
SomeDataVersionIdBIGINT IDENTITY(1, 1) NOT NULL,
SomeDataIdBIGINT NOT NULL REFERENCES SomeData(SomeDataId),
CreateDtDATETIME NOT NULL DEFAULT (GETDATE())
)
DECLARE @Id BIGINT
INSERT INTO SomeData (SomeText) VALUES ('DontCare'); SELECT @Id = SCOPE_IDENTITY();
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeData (SomeText) VALUES ('DontCare'); SELECT @Id = SCOPE_IDENTITY();
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeData (SomeText) VALUES ('DontCare'); SELECT @Id = SCOPE_IDENTITY();
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
INSERT INTO SomeDataVersion(SomeDataId) VALUES (@Id);
SELECT * FROM SomeData
SELECT * FROM SomeDataVersion
-- Which is FASTER?
-- THIS?
SELECT SomeDataId, MAX(SomeDataVersionId)
FROM SomeDataVersion t
GROUP BY SomeDataId
-- OR THIS?
;WITH cte AS (
SELECT
t.SomeDataVersionId,
t.SomeDataId,
ROW_NUMBER() OVER (PARTITION BY t.SomeDataId ORDER BY t.SomeDataVersionId DESC) AS RowNum
FROM SomeDataVersion t
)
SELECT cte.SomeDataId, cte.SomeDataVersionId FROM cte WHERE RowNum = 1

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

Row_Number() is rarely faster, and usually only due to bad indexing/heaps, or because the data is incredibly limited that it has to number before moving forward. On large data tests it has failed rather badly on a number of benchmarks I've built.

Use MAX() or CrossApply as your default go-tos for logging table pulls. Use MAX() as your primary if you can't add supporting indexes for the Cross Apply (in particular because you need one in a DESC order for the TOP 1 clause).

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

hwells Right there with Babe Points: 732 More actions · Answer 2

I think the experts would say "It Depends"[/url]. I found this article by The Dixie Flatline to be extremely informative.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 3

Pagan DBA (8/1/2011)
-- Which is FASTER?

Do what any of us would do... TEST IT and see! 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 4

You guys actually test your code? More power to you :hehe:.

cliffb SSCarpal Tunnel Points: 4597 More actions · Answer 5

Performance is dependant on the amount of data and indexing. Even with indexes on the columns you are using for the ROW_NUMBER(), you still need to pull all results and then rank them based on your partiioning and ordering. If you just need the top 1, as suggested before, I have found that a subquery with MAX and GROUP BY is much quicker. I have also used CROSS APPLY as someone else suggested and it was faster than the ROW_NUMBER() method. I would limit ROW_NUMBER() and it's siblings to when you actually need to rank something vs pulling top 1 type operations.

Pagan DBA Right there with Babe Points: 794 More actions · Answer 6

Seriously, I did say I was getting sporadic results. How would I know that if I hadn't tested?

The idea behind the question was to get some insight into SQL and SQL Server.

Thanks to the gents who actually gave useful answer regarding ROW_NUMBER() rarely being faster on larger datasets and setting me straight on MAX()/GROUP BY

As an aside, I'm sorry but BIT(1) = 1/0 is my preferred choice to VARCHAR(1) = Y/N. This is because in spite of my efforts, I have always found a bozo who will circumvent whatever protection I offer using database constraints to but thinks like B (both?!?!?!) or U (unknown?!?!?!?) and mess up my application. Can't put say 9 in a BIT(1) can we now?

I really don't want to have to TEST my code to find this out. Prevention is the best cure. Y/N is for report writers, and they can write the logic for display.

Apologies for hijacking my own thread.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 7

I assure you nobody wanted to insult you.

The bottom line is that each server / app is different and that I rarely if ever see something really standard accross the board.

The only rule I've seen stay true all those years is the less work the server does the better. As annoying as this explaination is it always hold true.

With that in mind, you can read this to understand a little better how the server works and how to see what's going on with each of your queries :

http://www.sqlservercentral.com/articles/books/65831/

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 8

Pagan DBA (8/2/2011)
...
As an aside, I'm sorry but BIT(1) = 1/0 is my preferred choice to VARCHAR(1) = Y/N. This is because in spite of my efforts, I have always found a bozo who will circumvent whatever protection I offer using database constraints to but thinks like B (both?!?!?!) or U (unknown?!?!?!?) and mess up my application. Can't put say 9 in a BIT(1) can we now?
...

First of all you can put 9 in a BIT(1) , it will just turn to 1 😀

The second: it's not only 0 and 1 can be there... There is a third possible value: NULL

So, you can always find a hardcore bozo who will circumvent whatever protection you can setup and mess up your application. You just need to look in right places :hehe:

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 9

Pagan DBA (8/2/2011)
As an aside, I'm sorry but BIT(1) = 1/0 is my preferred choice to VARCHAR(1) = Y/N. This is because in spite of my efforts, I have always found a bozo who will circumvent whatever protection I offer using database constraints to but thinks like B (both?!?!?!) or U (unknown?!?!?!?) and mess up my application. Can't put say 9 in a BIT(1) can we now?

If you're discussing this line from my signature:

Space is cheap! nVARCHAR(1) all your BITs today!

It's a joke. 😀 8 bits takes up one byte, 8 nVARCHAR(1)'s (Nullable) takes up 32 bytes. Per record. 😉

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 10

Pagan DBA (8/2/2011)
Seriously, I did say I was getting sporadic results. How would I know that if I hadn't tested?
The idea behind the question was to get some insight into SQL and SQL Server.

Seriously... the 9 rows you posted isn't a performance test. The reason why you're getting sporadic results is because both run so fast with that small bit of data that you tested with that even the heartbeat of SQL Server will interfer with such a paltry "test".

It turns out that the gents gave the correct answer [font="Arial Black"]but YOU still don't really know [/font]because YOU haven't learned how to test and they provided no evidence, not even a link, that they're correct in what they say. What would you have done, as so often happens on forums, if they gave the wrong answer? Would you still believe them just because there were two of them? Of course you would because you just did that very same thing. 😉

Since you still don't really know if they're right or not, do like I first suggested... test it.

Here's your first lesson on how to make enough test data for this particular comparison. 😉 I'll let you play with indexes.

USE TempDB

;

IF OBJECT_ID('TempDB.dbo.SomeDataVersion','U') IS NOT NULL

DROP TABLE TempDB.dbo.SomeDataVersion

;

SELECT TOP (1000000)

SomeDataVersionID = IDENTITY(INT,1,1),

SomeDataID = ABS(CHECKSUM(NEWID()))%250000+1,

CreateDt = (RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'2010','2020'))+CAST('2010' AS DATETIME)

INTO dbo.SomeDataVersion

FROM sys.all_columns ac1,

sys.all_columns ac2

;

Protect yourself. Believe no one when it comes to questions like "Which is faster?"... not even me. 😉 Always do your own performance testing and learn what performance testing entails. I assure you that it doesn't have anything to do with 9 rows like you posted or even 10,000 rows.

Performance testing also has nothing to do with returning the data to the screen. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 11

Just as a minor point you should never use varchar(1) anyway - it takes up more storage than a char(1), regardless of if they both have a value in them or not.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Pagan DBA Right there with Babe Points: 794 More actions · Answer 12

Err...the code sample I provided was to serve to explain my question. The sporadic results were on my REAL problem where I have several pieces of data EACH versioned and unpredictable data distributions in terms of how many versions each piece of data has. I couldn't possibly post sample code for all scenarios here. As has been explained to me in multiple replies, my problem is specific and I get that. Unfortunately I guess my code sample was perceived in a way one would think I'm asking a general question, which I wasn't.

As to what I really don't know yet, it is true I'm not as educated as most people here as far as SQL is concerned. The reason I joined this forum - to learn.

Finally, one gent DID provide a link on how to write a test. "It Depends" - where performance of TOP(1), MAX()/GROUP BY and ROW_NUMBER() is being compared. The single most succinct, useful and unpatronizing reply to my original question and which is what I plan to use by experimenting with various indexes on my tables.

Thanks all.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 13

Pagan DBA (8/3/2011)
The reason I joined this forum - to learn.

Not sure it will help at this point, but I recognized that. Looking back at what I posted, I can see why you might have thought so, but I wasn't trying to be patronizing.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 14

I published a series of tests on this subject for retrieving versioned data over here at Simple-Talk[/url]. It might prove useful. The answer, in a nutshell, it depends.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning