Collation and SQL_VARIANT

  • Great question! I also learned something....

    Rob Schripsema
    Propack, Inc.

  • Great -- it forced me to re-read collation rules.

  • i ran this query to figure out how this works.. great question:-)

    ===========================================
    Better try and fail than not to try at all...

    Database Best Practices[/url]

    SQL Server Best Practices[/url]

  • Rich Weissler (11/1/2011)


    cfradenburg (11/1/2011)That's row number 2, not two rows.

    Doh! That's clear as day now. Thank you! (Both for the answer, and the patience.) (I learned two things today! First, a little more about SQL_Variant. Second, don't try to comprehend the QotD answer before coffee.)

    Particularly Paul's... 😉


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Nice question Kiwi!

  • SQL Kiwi (11/1/2011)


    Hah! Well I came across this collation behaviour with SQL_VARIANT when I was thinking one day about how EAV systems would work when confronted with string data with different comparison semantics that needed to be stored in the same (value) column. For sure, SQL_VARIANT is a niche market - but I have found it useful with UNPIVOT (or the equivalent APPLY syntax) from time to time. Anyway, I'm glad you enjoyed the question.

    Hm, do you have an explanation why the following code will return the exact same result? :crazy:

    DECLARE @Example TABLE

    (

    id TINYINT IDENTITY PRIMARY KEY,

    string SQL_VARIANT UNIQUE CLUSTERED

    )

    INSERT @Example (string) VALUES ('B' COLLATE Latin1_General_CI_AI)

    INSERT @Example (string) VALUES ('B' COLLATE Latin1_General_CI_AS)

    INSERT @Example (string) VALUES ('b' COLLATE Latin1_General_CS_AI)

    INSERT @Example (string) VALUES ('b' COLLATE Latin1_General_CS_AS)

    SELECT id, string, sql_variant_property(string, 'basetype') as bt, sql_variant_property(string, 'collation') as collat

    FROM @Example

    WHERE string = 'b' -- COLLATE Latin1_General_CI_AS

    Only if the WHERE clause is ommitted completely, all 4 rows are returned as expected.

    Played around with your code on SQL Server 2008 R2 (after having answered the QotD) and found out about this peculiar behavior.

    And thanks for a very interesting question!

    Regards,

    Michael

    PS: Apologies for not capitalizing key words...

  • michael.kaufmann (11/2/2011)


    Hm, do you have an explanation why the following code will return the exact same result? :crazy:

    My guess would be that your server has a default collation of Latin1_General_CI_AS. My server has Latin1_General_CS_AS as the default, and when I run your code, I get the row with id=4 returned.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (11/2/2011)


    michael.kaufmann (11/2/2011)


    Hm, do you have an explanation why the following code will return the exact same result? :crazy:

    My guess would be that your server has a default collation of Latin1_General_CI_AS. My server has Latin1_General_CS_AS as the default, and when I run your code, I get the row with id=4 returned.

    Thanks for your quick reply, Hugo--tend to keep forgetting about the default server collation... :w00t: (I ran the test in tempdb, not in a user database with a known collation).

  • Hugo Kornelis (11/2/2011)


    michael.kaufmann (11/2/2011)


    Hm, do you have an explanation why the following code will return the exact same result? :crazy:

    My guess would be that your server has a default collation of Latin1_General_CI_AS. My server has Latin1_General_CS_AS as the default, and when I run your code, I get the row with id=4 returned.

    Yes that's it. When I first wrote the demo code for this question I created a new database with an explicit collation, and ran the SELECT query without an explicit COLLATE clause. After some thought, I decided that this made the code a bit too long and made the question sneakier than I would like, so I went with the shorter version with the explicit COLLATE. Perhaps I made the wrong decision there; the dependency on default collation is a good learning point.

  • SQL Kiwi (11/1/2011)


    call.copse (11/1/2011)


    I occassionally use an EAV type thing for very limited cases (storing central control variables that may vary by territory say) - is SQL_VARIANT a good move for storing these, where it may be text numbers or dates? I've tended to just use nvarchar(MAX) before and convert as needed.

    For the UNPIVOT this would be for normalising some data perhaps for import purposes? I'm just struggling to come up with a use for this SQL_VARIANT chappy.

    SQL_VARIANT might be a good choice; after all it was added as a popular user-requested feature for EAV scenarios. It can make some EAV things a lot neater and easier, and there are other language features like SQL_VARIANT_PROPERTY that support this sort of use. Many people find using NVARCHAR(MAX) feels clunky in comparison (and downright awkward if you need to store different collations as in the example given in the question). The various pros and cons are way too much to get into detail with, but it's definitely something to familiarize yourself with, like other new-ish things in the same area like SPARSE columns.

    Thanks Paul, I'll definitely keep that in mind - it sounds like it is at least halfway thought through.

  • To me the answer should not depend on data type.

    C - case

    I - Insensitive

    A - Accent

    S - Sesitive

    b and B cannot be Accent differentiated

    b and B cannot be Case differentiated

  • jswong05 (11/15/2011)


    To me the answer should not depend on data type.

    C - case

    I - Insensitive

    A - Accent

    S - Sesitive

    b and B cannot be Accent differentiated

    b and B cannot be Case differentiated

    The point is the values in the column have different collations, and SQL Server cannot directly compare strings from different collations. For SQL_VARIANT data containing strings to compare equal, the collation must be the same first.

    Not sure what you mean by 'b' AND 'B' cannot be 'case differentiated' - they have different casing!

Viewing 12 posts - 31 through 41 (of 41 total)

You must be logged in to reply to this topic. Login to reply