UNIQUEIDENTIFIER vs BIGINT

Question

UNIQUEIDENTIFIER vs BIGINT

Viewing 11 posts - 31 through 40 (of 40 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398694 More actions · Answer 1

First things first, estimated costs are not measures that you can compare between queries. They are too subject to odd variability, so don't look at cost estimates within execution plans as a measure of performance. You clear the cache, but then only measure reloading it for the first query, not for the second. That's going to affect the behavior of the queries too. Instead, to get a clean measure, you need to execute them separately.

On my machine, the both executed in 0 ms. The difference is in the reads. The first query did 5, the second did 2. That might be the difference between them. But, because you're forcing the index on the optimizer, it's hard to know if you're seeing the same behavior that would be evident if that query was actually adequate for the optimizer to select it naturally.

You can see how the information is retrieved. First you have a seek predicate:

Seek Keys[1]: Start: [Testing].[dbo].[idxTest].Col1 >= Scalar Operator((10)), End: [Testing].[dbo].[idxTest].Col1 <= Scalar Operator((200))

Then you have a Predicate:

[Testing].[dbo].[idxTest].[Col4]>(60)

And that's for both seek operations. But the main difference is that the second query doesn't need to go through as many pages to find the values since it's only dealing with the two columns.

If you modify the scripts to test each naturally, letting the optimizer pick the results, you get basically the same behavior. The number of reads varies, and that's the sticking point. So, you've proven that the key order matters. What if you rearrange the compound key to put col4 in the second position in the first index?

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

PiMané SSChampion Points: 13965 More actions · Answer 2

I've created the index CREATE INDEX idxTest_03 ON idxTest (Col1, Col4, Col3, Col2) and the execution plan behavior is exactly the same as idxTest_02. But again the reads are the same as using idxTest_01.

So idxTest_03 and idxTest_01 have different column orders, generate different execution plans but have the same number of reads, take the same time to execute and both are Seeks.

I can almost assume the only column order that matters is the 1st one...

In our app I don't know exactly how the data will be filtered so I create 4 indexes: Col1, Col2, Col3, Col4; Col2, Col1, Col3, Col4; Col3, Col1, Col2, Col4 and Col4, Col1, Col2, Col3. The only thing I know is that it can only be filtered over those 4 coluns. Or is this just over engineering!?! Just create an index and monitor possible indexes that could be created?!

The index maintenance whenever a column was updated can be bad...

Thanks,

Pedro

If you need to work better, try working less...

Grant Fritchey SSC Guru Points: 398694 More actions · Answer 3

I'd be nervous about creating that many indexes, especially if you're only saving a few pages read from memory. If the query was called thousands of times a minute, heck yes, it's worth it. If it's only called a thousand times a day... probably not.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

PiMané SSChampion Points: 13965 More actions · Answer 4

Grant Fritchey (10/11/2012)
I'd be nervous about creating that many indexes, especially if you're only saving a few pages read from memory. If the query was called thousands of times a minute, heck yes, it's worth it. If it's only called a thousand times a day... probably not.

Thanks for all the help Grant 🙂

Just begging to read your book "SQL Server 2012 Query Performance Tuning" and hope to get more pointers..

Thanks,

Pedro

If you need to work better, try working less...

ScottPletcher SSC Guru Points: 100949 More actions · Answer 5

If you're going to search by Col1 and Col4, then, in general, put them first in the index, to narrow the row range immediately as much as possible.

I didn't say to automatically exclude the nvarchar(20) column from the index. You need to re-read my comments. I said "it depends", although in general, yes, you would only want a 40 char column in the clus index if it was absolutely needed.

I wasn't aware, and don't see way, stats on a char(2) would be inherently worse than on an int. I would think SQL would be able to create a map of value ranges for a char(2) as well as for an int.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

ScottPletcher SSC Guru Points: 100949 More actions · Answer 6

I know you want some specifics that you can apply to creating all indexes on all tables, but there really aren't any.

There are some general rules for creating proper indexes, but specifics -- exactly what columns, and in what order -- will still depend on each specific table's individual requirements.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

ScottPletcher SSC Guru Points: 100949 More actions · Answer 7

Grant Fritchey (10/11/2012)
As an aside, the CHAR(2) would be a really poor choice for the first column in a compound key. You'd want to use either the INT or the NVARCHAR(20) because they will generate more efficient statistics.

Why? I wasn't aware, and don't see why, stats on a char(2) would be inherently worse than on an int.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Grant Fritchey SSC Guru Points: 398694 More actions · Answer 8

ScottPletcher (10/11/2012)
Grant Fritchey (10/11/2012)
As an aside, the CHAR(2) would be a really poor choice for the first column in a compound key. You'd want to use either the INT or the NVARCHAR(20) because they will generate more efficient statistics.
Why? I wasn't aware, and don't see way, stats on a char(2) would be inherently worse than on an int.

It depends on the data. If you have a perfectly random distribution of ANSI characters, then you're looking at about 65,000 combinations, which could be reasonably well distributed for an index, depending on the number of rows it was supporting. But since an INT can hold 4,000,000,000 distinct values, the chances are, a better distribution. But you can set up anything so that it's the opposite. I was just playing off the standard uses of such numbers. An int is more likely to be a distributed value while a varchar(2) is more likely to be something like a state abbreviation. In this case, as more and more data gets added, the statistics for the varchar(2) get more and more useless as only 50 possible buckets (out of the 200 available) would be filled with hundreds and thousands of rows reflecting the same data over & over.

In short, I made a generalization.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ScottPletcher SSC Guru Points: 100949 More actions · Answer 9

Grant Fritchey (10/11/2012)
ScottPletcher (10/11/2012)
Grant Fritchey (10/11/2012)
As an aside, the CHAR(2) would be a really poor choice for the first column in a compound key. You'd want to use either the INT or the NVARCHAR(20) because they will generate more efficient statistics.
Why? I wasn't aware, and don't see way, stats on a char(2) would be inherently worse than on an int.
It depends on the data. If you have a perfectly random distribution of ANSI characters, then you're looking at about 65,000 combinations, which could be reasonably well distributed for an index, depending on the number of rows it was supporting. But since an INT can hold 4,000,000,000 distinct values, the chances are, a better distribution. But you can set up anything so that it's the opposite. I was just playing off the standard uses of such numbers. An int is more likely to be a distributed value while a varchar(2) is more likely to be something like a state abbreviation. In this case, as more and more data gets added, the statistics for the varchar(2) get more and more useless as only 50 possible buckets (out of the 200 available) would be filled with hundreds and thousands of rows reflecting the same data over & over.
In short, I made a generalization.

But since it's the leading column in a clustered index, it's still valuable at reducing the number of rows that have to be processed.

Seems to me that what you're talking about applies far more to a nonclus index.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Grant Fritchey SSC Guru Points: 398694 More actions · Answer 10

ScottPletcher (10/11/2012)
Grant Fritchey (10/11/2012)
ScottPletcher (10/11/2012)
Grant Fritchey (10/11/2012)
As an aside, the CHAR(2) would be a really poor choice for the first column in a compound key. You'd want to use either the INT or the NVARCHAR(20) because they will generate more efficient statistics.
Why? I wasn't aware, and don't see way, stats on a char(2) would be inherently worse than on an int.
It depends on the data. If you have a perfectly random distribution of ANSI characters, then you're looking at about 65,000 combinations, which could be reasonably well distributed for an index, depending on the number of rows it was supporting. But since an INT can hold 4,000,000,000 distinct values, the chances are, a better distribution. But you can set up anything so that it's the opposite. I was just playing off the standard uses of such numbers. An int is more likely to be a distributed value while a varchar(2) is more likely to be something like a state abbreviation. In this case, as more and more data gets added, the statistics for the varchar(2) get more and more useless as only 50 possible buckets (out of the 200 available) would be filled with hundreds and thousands of rows reflecting the same data over & over.
In short, I made a generalization.
But since it's the leading column in a clustered index, it's still valuable at reducing the number of rows that have to be processed.
Seems to me that what you're talking about applies far more to a nonclus index.

Working off the generalization, not really. The first column is the only one with detailed statistics. The compound key adds to the overall selectivity of the index, part of how the optimizer picks it, but it's the distribution of the data within the stats that most drives the optimizer towards choosing an index. Again, distribute the data in just the right way and this point changes, but then we're discussing data distribution, not the idea of useful statistics.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ScottPletcher SSC Guru Points: 100949 More actions · Answer 11

Grant Fritchey (10/11/2012)
ScottPletcher (10/11/2012)
Grant Fritchey (10/11/2012)
ScottPletcher (10/11/2012)
Grant Fritchey (10/11/2012)
As an aside, the CHAR(2) would be a really poor choice for the first column in a compound key. You'd want to use either the INT or the NVARCHAR(20) because they will generate more efficient statistics.
Why? I wasn't aware, and don't see way, stats on a char(2) would be inherently worse than on an int.
It depends on the data. If you have a perfectly random distribution of ANSI characters, then you're looking at about 65,000 combinations, which could be reasonably well distributed for an index, depending on the number of rows it was supporting. But since an INT can hold 4,000,000,000 distinct values, the chances are, a better distribution. But you can set up anything so that it's the opposite. I was just playing off the standard uses of such numbers. An int is more likely to be a distributed value while a varchar(2) is more likely to be something like a state abbreviation. In this case, as more and more data gets added, the statistics for the varchar(2) get more and more useless as only 50 possible buckets (out of the 200 available) would be filled with hundreds and thousands of rows reflecting the same data over & over.
In short, I made a generalization.
But since it's the leading column in a clustered index, it's still valuable at reducing the number of rows that have to be processed.
Seems to me that what you're talking about applies far more to a nonclus index.
Working off the generalization, not really. The first column is the only one with detailed statistics. The compound key adds to the overall selectivity of the index, part of how the optimizer picks it, but it's the distribution of the data within the stats that most drives the optimizer towards choosing an index. Again, distribute the data in just the right way and this point changes, but then we're discussing data distribution, not the idea of useful statistics.

I still think that applies only to nonclus indexes. The selectivity can be 95% and a clus index seek will still be used if it is applicable.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.