Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Variant Order 3 Expand / Collapse
Author
Message
Posted Monday, December 12, 2011 7:26 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 5:21 AM
Points: 6,098, Visits: 8,367
Christian Buettner-167247 (12/12/2011)
Maybe I have missed it, but is there any plausible reason to order by a SQL_VARIANT column, given the unintuitive sorting semantics? What problem can I solve with that order?

A well-defined sort order for SQL_VARIANT is a requirement to be able to build indexes on it.
I don't think there is any other reason to use comparisons (other than equal to or unequal to) on SQL_VARIANT data.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #1220249
Posted Monday, December 12, 2011 8:17 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, October 20, 2014 11:26 AM
Points: 44, Visits: 133
There is something that I don't understand about this one.

The article that is referenced in the solution says:
"When sql_variant values of the char, varchar, nchar, or nvarchar data types are compared, their collations are first compared based on the following criteria: LCID, LCID version, comparison flags, and sort ID. Each of these criteria are COMPARED AS INTEGER VALUES, and in the order listed"... (capitalization is mine, not in BOL)

I looked at the ComparisionStyle property for the collations used in the question (using SELECT COLLATIONPROPERTY('[collation_name], 'ComparisonStyle')) and this is what I got
- latin1_general_bin = 0
- latin1_general_cs_as = 196608
- latin1_general_ci_as = 196609
- latin1_general_cs_ai = 196610
- latin1_general_ci_ai = 196611

Based on these results... I expected the order to be E, C, A, D, B...

I am obviously missing something, what is that?

Thanks!
Post #1220276
Posted Monday, December 12, 2011 8:57 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:35 AM
Points: 7,874, Visits: 9,613
Christian Buettner-167247 (12/12/2011)
I liked the first question in the series, but I did not really need the follow-ups.
The first question already taught me to never ever consider order by SQL_VARIANT columns.

Maybe I have missed it, but is there any plausible reason to order by a SQL_VARIANT column, given the unintuitive sorting semantics? What problem can I solve with that order?

I think the only reason for having an order on the SQL_VARIANT data type is to allow it to occur in indexes (including indexes required to support UNIQUE and PRIMARY KEY constraints). I don't think it's generally useful to know much about the order, and I don't imagine that knowing the order could help in writing more efficient queries. Obviously knowing the order is useful to those who have to write a data engine that supports it, so it's of interest to those who write data engines. I don't think it's of much interest to anyone else, not event to practitioners or EAV models (which are where most SQL variants turn up, of course: the alternative of using NVARCHAR(4000) to contain text which can be read as SQL speciying the underlying type, the collation if relevant, and the base value is even more dreadful than using SQL_VARIANT; of course there's little point in using an RDBMS if one's aim is to provide an EAV model, but....).

I have seen a case where the primary key of a table was an SQL_VARIANT column; it's usefulness was not as a determiner of row proximity in physical store or as an efficient way of looking at small key-ranges but as a something that was unique. For the thing to be used as a primary key, there had to be an order so that the index supporting the primary key could be constructed. There was no combination of other columns in the table that would be unique on its own, so the table had to include that column in the primary key. It wasn't my design (I'd never have done it that way) but it seemed reasonable. So although it's rare, using the order happens - but as far as I know, knowing what the order is as opposed to knowing that there is an order is useful only to those who have to implement the data engine.

What's the most important use of SQL_VARIANT order? Well, in my experience its main use is in generating T-SQL trivia questions for use as QotD - we've had 5 since 1st November (1 from Paul, 1 from bitbucket, and 3 from me).


Tom
Post #1220317
Posted Monday, December 12, 2011 10:00 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, January 31, 2013 8:53 AM
Points: 1,176, Visits: 778
Tks Tom.
Post #1220367
Posted Monday, December 12, 2011 10:18 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:35 AM
Points: 7,874, Visits: 9,613
msurasky-905715 (12/12/2011)
There is something that I don't understand about this one.

The article that is referenced in the solution says:
"When sql_variant values of the char, varchar, nchar, or nvarchar data types are compared, their collations are first compared based on the following criteria: LCID, LCID version, comparison flags, and sort ID. Each of these criteria are COMPARED AS INTEGER VALUES, and in the order listed"... (capitalization is mine, not in BOL)

I looked at the ComparisionStyle property for the collations used in the question (using SELECT COLLATIONPROPERTY('[collation_name], 'ComparisonStyle')) and this is what I got
- latin1_general_bin = 0
- latin1_general_cs_as = 196608
- latin1_general_ci_as = 196609
- latin1_general_cs_ai = 196610
- latin1_general_ci_ai = 196611

Based on these results... I expected the order to be E, C, A, D, B...

I am obviously missing something, what is that?

Thanks!

I should have included a warning about that one in my explanation for this question - the problem is that the comparison style returned by COLLATIONPROPERTY for a collation is not the Comparison Style used in comparing collations to order SQL_VARIANT values; in particular, it delivers 0 both for _BIN and for _BIN2 for all locales (despite these being generally quite distinct comparison styles), whereas variant puts these two just about at opposite ends of the scale (at second and last out of 18). Some of the other numbers it returns also fail to match up with the style number used in variant sorting (eg it returns 0 for _CS_AS_KS{_WS too). And I don't know whether this is a bug in the COLLATIONPROPERTY function; but as returning the same value - 0 - for two different styles is actually documented, I guess it's not a bug unless the documentation has a partially matching bug. Most probably it is just another failure of BoL to tell us what is actually going on. I also don't know what the integer comparison styles used by the SQL_VARIANT order are; just that their order matches the list I provided.

The Collation version returned by COLLATIONPROPERTY also differs from the LCID version used in sorting variants, as noted in the explanation of the previous question in this series, but that of course is fair enough - it's a collation version, not a locale version. The issue you have hit on is rather nastier.

Incidentally, sort ID is something that old-fashioned SQL collations (collation name beginning SQL_) had, and can be derived somehow for Windows collations (Windows collations may once have had it; I don't think modern ones do) to help people understand how to maintain compatability with databases which used older collations. I don't think it comes into the ordering of windows collations used in SQL_VARIANT at all (but I may be wrong - the documentation says it does, as you point out - it says sort ID is compared, not sort ID is compared if it exists).


Tom
Post #1220374
Posted Monday, December 12, 2011 10:22 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:35 AM
Points: 7,874, Visits: 9,613
jts_2003 (12/12/2011)
Can anyone explain the answer, as I didn't understand the explanation at all

I guess the thing to do is to read teh BoL page referenced, plus read the explanation for Paul's 1Nov QotD, the explanation for my 30 Nov QotD, and the explanation for my 5 Dec QotD before attempting to understand this one. Ordering of SQL_VARIANT is pretty complex, and it seems that a lot of people find the very brief explanation in BoL unhelpful (hence my rather verbose explanation for the 5 Dec QotD).


Tom
Post #1220375
Posted Monday, December 12, 2011 11:35 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, August 28, 2014 8:53 PM
Points: 1,388, Visits: 3,039
So nearly two-thirds of respondants so far are smarter than I. They wouldn't have just run the code and clicked the corresponding option now, would they?

Anyway, I ended up guessing wrong, partially based on a table of SQL Collation Names with sort order values found here: http://msdn.microsoft.com/en-US/library/ms180175(v=SQL.105).aspx

Despite Tom's explanations, which so far make sense to me, I don't understand how this table would have value if it doesn't bear out the actual order. Perhaps he or one of the 63% who got the correct answer could clarify this.
Post #1220402
Posted Monday, December 12, 2011 1:11 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, October 20, 2014 11:26 AM
Points: 44, Visits: 133
Thanks for taking the time to reply Tom...
Post #1220444
Posted Monday, December 12, 2011 1:20 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:35 AM
Points: 7,874, Visits: 9,613
john.arnott (12/12/2011)
So nearly two-thirds of respondants so far are smarter than I. They wouldn't have just run the code and clicked the corresponding option now, would they?

I suspect some - perhaps most - did just that. It's very difficult to do any other way if you don't just happen to know that _cs is lower than _ci, _as is lower than _ai, _ks is lower than blank, _ws is lower than blank, they affect the order in the order I've listed them here, and that BIN comes last, and then remember the bizarre positioning of BIN2. I included a list of ComparisonStyles and their order in the explanation without any reference in the explanation because I haven't a clue where I got in from (I verified that it still worked when I found my rather old (although I must have updated them when 2005 came along) notes on it).

Anyway, I ended up guessing wrong, partially based on a table of SQL Collation Names with sort order values found here: http://msdn.microsoft.com/en-US/library/ms180175(v=SQL.105).aspx

Despite Tom's explanations, which so far make sense to me, I don't understand how this table would have value if it doesn't bear out the actual order. Perhaps he or one of the 63% who got the correct answer could clarify this.

That's actually quite straightforward. The "sort order id" of this column is the "sort id" of page about SQL_VARIANT; that means that if two collations have the same Locale ID, the same Locale Version, and the same ComparisonStyle then (and only then) is this "sort order id" used to determine the order - and for the collations in the question, it's already been determined by Locale ID and version and ComparisonStyle, so no account is taken of the "order id". (I'm using both names, so as to match BoL on this, but using quote marks to make it obvious that the naming is adrift).

I should probably have included the above in the explanation, but I didn't even think of it at the time. I suppose I can point out in mitigation that there are a couple of things about that page and the table it contains that might have acted as warning flags. The table on that page lists only 200 collations out of the 2397 that my copy of SQL 2008 R2 has, so it isn't going to be the whole story, not even when you allow for the fact that sort order id 80 may bring that total up to 250 or maybe more by allowing many different collations that use binary order and code page 1250. The fact that this page shows that collations for different locales can have the same sort order id demonstrates that this sort order id can't be the whole of the ordering of collations for sql_variant, because the locale id is the most significant of the attributes that determine that order. The table only gives a sort order id for three of the collations used in the question, the other two are missing.

Only a small portion of the windows collations introduced for sql use in or before SQL 2000 appear to have been given sort ids. As I understand it, if two windows collations have the same locale id, locale version (I believe implies that they have the same collation version and I know it doesn't follow from their having the same collation version), and the same ComparisonStyle they will also have the same "sort order id" and hence will compare equal as collations on SQL_VARIANT values. But I could be wrong - I can't find any documentation to back that last statement up, it's just something IO seem to remember out of the misty past, and not having found any counterexamples is no guarantee (especially as I haven't looked very hard).


Tom
Post #1220450
Posted Monday, December 12, 2011 3:57 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Tuesday, November 25, 2014 9:51 AM
Points: 5,446, Visits: 7,616
It's official. I hate Variant.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1220528
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse