Sensitive Context

  • Comments posted to this topic are about the item Sensitive Context

  • Good question, thanks Steve.
    Really enjoyed this one...

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • This applies only to uniqueidentifier. For variables of type char, varchar, etc. not.
    I created the table 'aaa' in tempdb which has a binary collation.
    I created a db with case insensitive colletion "db_CI_AI" and I run the following batch from "db_CI_AI" db context.

    Use tempdb -- case sensitive db
    create table aaa(rowguid uniqueidentifier,name sysname)
    insert into aaa select NEWID(),name from sys.objects
    use db_CI_AI -- case insensitive db
    SELECT * FROM tempdb.dbo.aaa
    WHERE name like '%SYS%' -- zero rows returned
    SELECT * FROM tempdb.dbo.aaa
    WHERE rowguid like '%def%' -- returned some rows. Note, change '%def%' with a string that is present in the column.

  • Wow does that really happen?
     In my eyes that is a bug; the DbName.Table.Field syntax means the query can only possibly run against that 1 database and thus only settings for that DB should be used.

  • An interesting question, thanks for the perfect explanation, I learned something new.🙂

  • peter.row - Thursday, April 5, 2018 6:05 AM

    Wow does that really happen?
     In my eyes that is a bug; the DbName.Table.Field syntax means the query can only possibly run against that 1 database and thus only settings for that DB should be used.

    Hi Peter. No, this is not a bug. When executing a query, the Collation used for string literals and variables and implicit conversions from non-string types is the current Database's default Collation. Yes, the Table might be in another DB, but that doesn't control anything here. Now, if the column in that Table was a non-XML string type, then that column's Collation would be used. But neither XML nor non-string types have a Collation, so they get coerced into the Collation of either a real column being used in the same predicate or concatenation, or the Collation specified by the COLLATE keyword (if present), or the Collation of the current DB if neither a real column nor the COLLATE keyword are being used.

    Take care,
    Solomon....

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Carlo Romagnano - Thursday, April 5, 2018 2:39 AM

    This applies only to uniqueidentifier. For variables of type char, varchar, etc. not.
    I created the table 'aaa' in tempdb which has a binary collation.
    I created a db with case insensitive colletion "db_CI_AI" and I run the following batch from "db_CI_AI" db context.

    Hi Carlo. Your statement of "this applies only to UNIQUEIDENTIFIER" is incorrect. The underlying issue here is that while string columns have a Collation (whether or not it matches the default Collation of the DB that the Table is in), non-string types (and this would include XML) that allow for implicit conversion to string types do not have a Collation. So, these non-string types are considered "coercible" and will use the Collation of either: the COLLATE keyword (if present), or a string column used in the same predicate / concatenation / etc, or the current Database if used along with a string literal or variable. UNIQUEIDENTIFIER is one of the few non-string types that can implicitly convert, but is not the only type. DATETIME is another type that allows for this behavior for the same reason. For example:

    SELECT CONVERT(VARCHAR(50), so.[create_date]), so.*
    FROM tempdb.sys.objects so
    WHERE so.[create_date] LIKE '%a%'
    -- Matches the "a" in "Mar"
    -- SQL Server 2017: 6 rows in case-sensitive or binary DB, 100 rows in case-insensitive DB

    SELECT CONVERT(VARCHAR(50), so.[create_date]), so.*
    FROM tempdb.sys.objects so
    WHERE so.[create_date] LIKE '%A%'
    -- Matches the "A" in: "Apr", "Aug", and "AM"
    -- SQL Server 2017: 98 rows in case-sensitive or binary DB, 100 rows in case-insensitive DB

    Take care,
    Solomon....

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Hi Steve. Great question. I would, however, suggestion updating the explanation as it is incomplete and a little misleading. Currently it states:

    The object that contains the string is declared inside of the master database

    That is a mis-read of the could-be-worded-better documentation. The object here is the string, or more specifically, the string constant (or literal). Otherwise, the "object" could be a parameter or variable. The explanation should be:

    The column being used in the predicate, rowguid, is a non-string type that allows for implicit conversion to a string type. Since the column is not a string type, it does not have a defined collation. In these cases, the resulting string is considered a coercible-default. According to the rules of collation precedence, the collation used for the expression or predicate will be, in order of priority: what is specified by the COLLATE keyword, what is specified by a string column, or the current database's default collation when using string constants or variables or other implicit conversions. In this particular case, the COLLATE keyword is not present, and the column is a non-string type and so does not have a specified collation. Hence, the collation of the string constant, N'%abc%', which uses the collation of the current database, is used.

    Take care,
    Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky - Thursday, April 5, 2018 10:24 AM

    peter.row - Thursday, April 5, 2018 6:05 AM

    Wow does that really happen?
     In my eyes that is a bug; the DbName.Table.Field syntax means the query can only possibly run against that 1 database and thus only settings for that DB should be used.

    Hi Peter. No, this is not a bug. When executing a query, the Collation used for string literals and variables and implicit conversions from non-string types is the current Database's default Collation. Yes, the Table might be in another DB, but that doesn't control anything here. Now, if the column in that Table was a non-XML string type, then that column's Collation would be used. But neither XML nor non-string types have a Collation, so they get coerced into the Collation of either a real column being used in the same predicate or concatenation, or the Collation specified by the COLLATE keyword (if present), or the Collation of the current DB if neither a real column nor the COLLATE keyword are being used.

    Take care,
    Solomon....

    Ok - technically it is not a bug - but given the example - i.e. the explicit-ness, the SQL engine *should* use the collation of the database or the database column being queried.
    Give me 1 example where the current behaviour is ever what you would actually want to happen, because I can't think of a single case. All it does currently is trip you up if you happen to be using SSMS in this way.

     I wonder does this same behaviour apply if for example, you connected to a DB from C# with a user whose default DB was master but the DB in your connection string was your specific DB and you executed a dynamic SQL.

  • peter.row - Friday, April 6, 2018 1:06 AM

    Solomon Rutzky - Thursday, April 5, 2018 10:24 AM

    peter.row - Thursday, April 5, 2018 6:05 AM

    Wow does that really happen?
     In my eyes that is a bug; the DbName.Table.Field syntax means the query can only possibly run against that 1 database and thus only settings for that DB should be used.

    Hi Peter. No, this is not a bug. When executing a query, the Collation used for string literals and variables and implicit conversions from non-string types is the current Database's default Collation. Yes, the Table might be in another DB, but that doesn't control anything here. Now, if the column in that Table was a non-XML string type, then that column's Collation would be used. But neither XML nor non-string types have a Collation, so they get coerced into the Collation of either a real column being used in the same predicate or concatenation, or the Collation specified by the COLLATE keyword (if present), or the Collation of the current DB if neither a real column nor the COLLATE keyword are being used.

    Take care,
    Solomon....

    Ok - technically it is not a bug - but given the example - i.e. the explicit-ness, the SQL engine *should* use the collation of the database or the database column being queried.
    Give me 1 example where the current behaviour is ever what you would actually want to happen, because I can't think of a single case. All it does currently is trip you up if you happen to be using SSMS in this way.

     I wonder does this same behaviour apply if for example, you connected to a DB from C# with a user whose default DB was master but the DB in your connection string was your specific DB and you executed a dynamic SQL.

    I can understand why you (and probably others) feel this way, but in the end that would not work. Remember, the only reason that the current DB's Collation is being used is because there is no Collation specified for the column in the other DB since that column is not a string type. If a string column was being used in the predicate, then that would provide the Collation to use. For example:

    SELECT [create_date], [name],
       SUBSTRING([name], 4, 1) AS [Char], CONVERT(VARCHAR(50), [create_date]) AS [StringDate]
    FROM master.sys.objects obj
    WHERE [create_date] LIKE N'%' + SUBSTRING([name], 4, 1) + N'%' -- COLLATE Latin1_General_100_CI_AS
    -- On SQL Server 2017 with "_BIN2" collation: 32 rows with COLLATE, 14 with it commented out

    Unlike the previous queries, this one does not change behavior based on the current DB's Collation. The results are partially shown below:

    create_date                name          Char    StringDate
    2009-04-13 12:59:05.513    sysrowsets    r       Apr 13 2009 12:59PM
    2009-04-13 12:59:05.450    sysprivs      p       Apr 13 2009 12:59PM

    This query is using the "name" column from the "master.sys.objects" system catalog view on an instance having a binary Collation, hence the "name" column is using a binary Collation. Now the implicit conversion is at the whim of the Collation of that column. If I use another column with a different Collation, or the same column from the same catalog view in a DB that has a non-binary, non-case-sensitive default Collation, then the behavior would be different as it would be subject to another Collation. In either case, the Collation being used for the operation does not need to match the default Collation of the database that the date column is in.

    Or, what about a case where we are comparing two date columns, converted to VARCHAR, so neither one has a Collation specified, yet the two columns are from two separate tables in two different databases? There is no basis for preferring one DB's Collation over the other, simply based on where the Table exists. Or, should there be a Collation conflict error raised in these cases? And there can be several of these in a query. What is common here is the DB in which the query is being executed. So it actually does make sense that if no Collation is provided, then the default is to use the Collation of the current DB. That provides consistency between string constants / literals, variables, and expressions that do not naturally have a Collation. Assuming the Collation to be that of the DB where the non-string column exists would be just as confusing, just in a different way.

    Also, this issue has absolutely nothing to do with SSMS, or any client. So yes, if you connect to SQL Server via SqlClient in .NET to DatabaseA with a Login that has a default DB of DatabaseB, then an ad hoc query will use the Collation of DatabaseA since that is the current DB. The Login's default DB does not matter. But again, this is only for string constants, variables, and expressions having no Collation.

    Take care,
    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky - Friday, April 6, 2018 3:23 PM

    peter.row - Friday, April 6, 2018 1:06 AM

    Solomon Rutzky - Thursday, April 5, 2018 10:24 AM

    peter.row - Thursday, April 5, 2018 6:05 AM

    Wow does that really happen?
     In my eyes that is a bug; the DbName.Table.Field syntax means the query can only possibly run against that 1 database and thus only settings for that DB should be used.

    Hi Peter. No, this is not a bug. When executing a query, the Collation used for string literals and variables and implicit conversions from non-string types is the current Database's default Collation. Yes, the Table might be in another DB, but that doesn't control anything here. Now, if the column in that Table was a non-XML string type, then that column's Collation would be used. But neither XML nor non-string types have a Collation, so they get coerced into the Collation of either a real column being used in the same predicate or concatenation, or the Collation specified by the COLLATE keyword (if present), or the Collation of the current DB if neither a real column nor the COLLATE keyword are being used.

    Take care,
    Solomon....

    Ok - technically it is not a bug - but given the example - i.e. the explicit-ness, the SQL engine *should* use the collation of the database or the database column being queried.
    Give me 1 example where the current behaviour is ever what you would actually want to happen, because I can't think of a single case. All it does currently is trip you up if you happen to be using SSMS in this way.

     I wonder does this same behaviour apply if for example, you connected to a DB from C# with a user whose default DB was master but the DB in your connection string was your specific DB and you executed a dynamic SQL.

    I can understand why you (and probably others) feel this way, but in the end that would not work. Remember, the only reason that the current DB's Collation is being used is because there is no Collation specified for the column in the other DB since that column is not a string type. If a string column was being used in the predicate, then that would provide the Collation to use. For example:

    SELECT [create_date], [name],
       SUBSTRING([name], 4, 1) AS [Char], CONVERT(VARCHAR(50), [create_date]) AS [StringDate]
    FROM master.sys.objects obj
    WHERE [create_date] LIKE N'%' + SUBSTRING([name], 4, 1) + N'%' -- COLLATE Latin1_General_100_CI_AS
    -- On SQL Server 2017 with "_BIN2" collation: 32 rows with COLLATE, 14 with it commented out

    Unlike the previous queries, this one does not change behavior based on the current DB's Collation. The results are partially shown below:

    create_date                name          Char    StringDate
    2009-04-13 12:59:05.513    sysrowsets    r       Apr 13 2009 12:59PM
    2009-04-13 12:59:05.450    sysprivs      p       Apr 13 2009 12:59PM

    This query is using the "name" column from the "master.sys.objects" system catalog view on an instance having a binary Collation, hence the "name" column is using a binary Collation. Now the implicit conversion is at the whim of the Collation of that column. If I use another column with a different Collation, or the same column from the same catalog view in a DB that has a non-binary, non-case-sensitive default Collation, then the behavior would be different as it would be subject to another Collation. In either case, the Collation being used for the operation does not need to match the default Collation of the database that the date column is in.

    Or, what about a case where we are comparing two date columns, converted to VARCHAR, so neither one has a Collation specified, yet the two columns are from two separate tables in two different databases? There is no basis for preferring one DB's Collation over the other, simply based on where the Table exists. Or, should there be a Collation conflict error raised in these cases? And there can be several of these in a query. What is common here is the DB in which the query is being executed. So it actually does make sense that if no Collation is provided, then the default is to use the Collation of the current DB. That provides consistency between string constants / literals, variables, and expressions that do not naturally have a Collation. Assuming the Collation to be that of the DB where the non-string column exists would be just as confusing, just in a different way.

    Also, this issue has absolutely nothing to do with SSMS, or any client. So yes, if you connect to SQL Server via SqlClient in .NET to DatabaseA with a Login that has a default DB of DatabaseB, then an ad hoc query will use the Collation of DatabaseA since that is the current DB. The Login's default DB does not matter. But again, this is only for string constants, variables, and expressions having no Collation.

    Take care,
    Solomon...

    I appreciate the response, but I still believe that if the query goes to the effort of explicitly mentioning the DB name as well as table and column name then the SQL engine should use the default collation of the database just like it would if you queried the table from a stored procedure.

    Ultimately it's the same issue as when you create a #temp table with a string type field, i.e. you should always define the table with collate database default to ensure your string comparisons work as expected.
    However I believe that computers should make things easier for us whenever possible. In both this article case and the #temp table example I just gave the vastly more sensible decision for the SQL engine to take is to use the default collation of the database being used to run the query. The fact that it doesn't just makes you have to jump through hoops every single time.

    Since you couldn't think of an example where the current behaviour is useful either that just proves my point that the current behaviour is nothing more than a pain in the arse.

  • peter.row - Monday, April 9, 2018 1:25 AM

    I appreciate the response, but I still believe that if the query goes to the effort of explicitly mentioning the DB name as well as table and column name then the SQL engine should use the default collation of the database just like it would if you queried the table from a stored procedure.

    Ultimately it's the same issue as when you create a #temp table with a string type field, i.e. you should always define the table with collate database default to ensure your string comparisons work as expected.
    However I believe that computers should make things easier for us whenever possible. In both this article case and the #temp table example I just gave the vastly more sensible decision for the SQL engine to take is to use the default collation of the database being used to run the query. The fact that it doesn't just makes you have to jump through hoops every single time.

    Since you couldn't think of an example where the current behaviour is useful either that just proves my point that the current behaviour is nothing more than a pain in the arse.

    Hi Peter. Well, "useful" is a bit subjective ;). I'm not sure that I agree with you regarding SQL Server assuming the Table's DB's Collation for values that do not have a Collation. That is actually contrary to how the rest of the query would work. String literals / constants and variables / parameters already take on the current DB's Collation if that is not overridden / coerced by a column's Collation or the COLLATE keyword. In fact, it would seem inconsistent to use the current DB's Collation for some collationless values while using a remote DB's Collation for others. Just because the DATETIME / UNIQUEIDENTFIER column exists in another DB does not imply any desired Collation for it since it isn't text data. Yes, a DB's Collation is the default used for new columns if the COLLATE keyword isn't specified, but you could have a table where all of its text columns use a Collation that is different from the default Collation of the DB where that Table exists, whether they are all different Collations or even all the same. I mean, if you have a DB that has a binary default location and are using that for all columns in all user Tables, do you really care if someone does a case-insensitive comparison on a GUID or date value? Why is anyone even converting those to strings in the first place to filter on?

    And within a Stored Procedure is no different. The code in a Stored Procedure executes within the DB that the proc exists in. And if that proc has a query that grabs a DATETIME column from another DB and converts it to VARCHAR, it will use the Collation of the DB that contains the proc, because that is the "current" DB. Either way, if a particular Collation is that important, then that is what the COLLATE keyword helps with.

    The same is true for temp tables. They exist in TempDB and so, by default, use the Collation of the DB in which they exist as their default. However, I agree that on a practical level it can be a bit annoying, but aren't most DBs using the same Collation as the instance, and hence the same as TempDB? Either way, you shouldn't always use COLLATE DATABASE_DEFAULT. There are times when that is quite helpful, but you should use the Collation that provides the desired functionality. If the code can be deployed to multiple DBs having non-uniform Collations between them, then DATABASE_DEFAULT is handy. Or if the DB's Collation can change (but that is rare, I would hope). For partially contained DB's you would instead use CATALOG_DEFAULT. But if the code is just going to be in one DB, then best to specify the Collation that you want. Still, I have written a lot of stored procedures and use a fair number of temp tables, and I think I have only ever run into this annoyance once, possibly twice.

    Take care,
    Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky - Wednesday, April 11, 2018 11:31 AM

    peter.row - Monday, April 9, 2018 1:25 AM

    I appreciate the response, but I still believe that if the query goes to the effort of explicitly mentioning the DB name as well as table and column name then the SQL engine should use the default collation of the database just like it would if you queried the table from a stored procedure.

    Ultimately it's the same issue as when you create a #temp table with a string type field, i.e. you should always define the table with collate database default to ensure your string comparisons work as expected.
    However I believe that computers should make things easier for us whenever possible. In both this article case and the #temp table example I just gave the vastly more sensible decision for the SQL engine to take is to use the default collation of the database being used to run the query. The fact that it doesn't just makes you have to jump through hoops every single time.

    Since you couldn't think of an example where the current behaviour is useful either that just proves my point that the current behaviour is nothing more than a pain in the arse.

    Hi Peter. Well, "useful" is a bit subjective ;). I'm not sure that I agree with you regarding SQL Server assuming the Table's DB's Collation for values that do not have a Collation. That is actually contrary to how the rest of the query would work. String literals / constants and variables / parameters already take on the current DB's Collation if that is not overridden / coerced by a column's Collation or the COLLATE keyword. In fact, it would seem inconsistent to use the current DB's Collation for some collationless values while using a remote DB's Collation for others. Just because the DATETIME / UNIQUEIDENTFIER column exists in another DB does not imply any desired Collation for it since it isn't text data. Yes, a DB's Collation is the default used for new columns if the COLLATE keyword isn't specified, but you could have a table where all of its text columns use a Collation that is different from the default Collation of the DB where that Table exists, whether they are all different Collations or even all the same. I mean, if you have a DB that has a binary default location and are using that for all columns in all user Tables, do you really care if someone does a case-insensitive comparison on a GUID or date value? Why is anyone even converting those to strings in the first place to filter on?

    And within a Stored Procedure is no different. The code in a Stored Procedure executes within the DB that the proc exists in. And if that proc has a query that grabs a DATETIME column from another DB and converts it to VARCHAR, it will use the Collation of the DB that contains the proc, because that is the "current" DB. Either way, if a particular Collation is that important, then that is what the COLLATE keyword helps with.

    The same is true for temp tables. They exist in TempDB and so, by default, use the Collation of the DB in which they exist as their default. However, I agree that on a practical level it can be a bit annoying, but aren't most DBs using the same Collation as the instance, and hence the same as TempDB? Either way, you shouldn't always use COLLATE DATABASE_DEFAULT. There are times when that is quite helpful, but you should use the Collation that provides the desired functionality. If the code can be deployed to multiple DBs having non-uniform Collations between them, then DATABASE_DEFAULT is handy. Or if the DB's Collation can change (but that is rare, I would hope). For partially contained DB's you would instead use CATALOG_DEFAULT. But if the code is just going to be in one DB, then best to specify the Collation that you want. Still, I have written a lot of stored procedures and use a fair number of temp tables, and I think I have only ever run into this annoyance once, possibly twice.

    Take care,
    Solomon..

    I hadn't thought of using a table with multiple text columns each with a different collation so that is a case where what I proposed would be a problem.
     I always use collate database default on my #temp tables because that is always what I want. I work on a software product where it could be installed on a customers SQL Server over which I have no control of the tempDB and thus collating to database default when creating #temp tables is the only way to go because that is only way to be sure you won't be screwed over by the config of the SQL Server.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply