Intermittent sql_variant problem on COMPATIBILITY_LEVEL >= 120

  • I have encountered a reproducible but intermittent problem when working with sql_variant values in SQL 2016.  The problem does not exist if COMPATIBILITY_LEVEL < 120 (i.e. the problem seems to have been introduced with SQL 2014).

    I am providing a simple test below, but by the way of overview:

    I have a stored procedure that accepts a sql_variant parameter.  This parameter is an ID value.  The system has both int ID's and uniqueidentifier ID's, and this procedure is to accept either style of ID and perform some action.

    Sometimes when the stored procedure is called, the parameter will contain an int, sometimes a uniqueidentifer.  If it contains a uniqueidentifier the procedure needs to look up the corresponding int value from a table.  Regardless, the int value then needs to be inserted into a table.

    The insert statement uses a CASE statement to cast the sql_variant parameter to the correct native type.  It does this by looking at the SQL_VARIANT_PROPERTY(@MyVar, 'BaseType'), and then performing the appropriate CAST.

    This all works fine in earlier versions of SQL.  But in current versions, the procedure intermittently fails with an error:  "Explicit conversion from data type int to uniqueidentifier is not allowed." ... even though no such conversion is actually being performed.

    A strange aspect of the problem is this:  If I call the procedure and pass in either an int or a uniqueidentifier value, it fails.  If I then call the procedure again and pass in a string containing an int value, it succeeds.  Then after this success, if I once again call the procedure with an int or an uniqueidentifier value, the procedure succeeds.

    In the test below, a temporary stored procedure is created that just inserts a row into a temporary table.  Calling this stored procedure with different data types demonstrates this behavior.  Changing the COMPATIBILITY_LEVEL influences the results:  current versions fail, older versions succeed.

    Does anyone have any insight into what is going on?  Is this a bug?

    FYI, I would not use the sql_variant type for storing data in a table.  But it seems like an appropriate use of the sql_variant type if I want a parameter value that can sometimes be an int and sometimes be a uniqueidentifier.  (In other words, I don't need a lecture about avoiding use of sql_variant.)  I understand a number of work-arounds, including CASTing to a varchar instead of a uniqueidentifier.  I am just trying to understand why there is a change in behavior between different versions of SQL, and to understand if this is a bug in SQL.

    CREATE PROCEDURE #DemoBadVariant
    @MyVar sql_variant
    AS
    BEGIN
    SET NOCOUNT ON

    CREATE TABLE #Test1 (MyCol int)

    --If @MyVar is an int, we want to insert that value
    --If @MyVar is a GUID, we want to look up the corresponding int ID (simulating by using a CASE)
    --If @MyVar is anything else, cast to a string, and if ISNUMERIC assume it is an int, and insert that value
    --Else, insert a NULL

    INSERT INTO #Test1
    VALUES (
      CASE SQL_VARIANT_PROPERTY(@MyVar, 'BaseType')
      WHEN 'int' THEN CAST(@MyVar AS int)   
      WHEN 'uniqueidentifier' THEN CASE WHEN CAST(@MyVar AS uniqueidentifier) = NEWID() THEN 777 ELSE 888 END
      ELSE CASE WHEN ISNUMERIC(CAST(@MyVar AS varchar(8000))) = 1 THEN CAST(@MyVar AS int) ELSE NULL END
      END
    )

    PRINT 'SUCCESS!'
    PRINT ''
    END

    GO

    /*
    Run the stored proc, passing in different data types.

    Note that compatibility level > 110 fails on uniqueidentifier and int.
    But curiously, on compatibility level > 110 these types succeed after
    a call with a char.

    */

    --ALTER DATABASE osTest1 SET COMPATIBILITY_LEVEL = 100 --SQL 2008
    --ALTER DATABASE osTest1 SET COMPATIBILITY_LEVEL = 110 --SQL 2012
    --ALTER DATABASE osTest1 SET COMPATIBILITY_LEVEL = 120 --SQL 2014
    ALTER DATABASE osTest1 SET COMPATIBILITY_LEVEL = 130 --SQL 2016

    GO

    PRINT 'Test uniqueidentifier'
    DECLARE @MyVal uniqueidentifier
    SET @MyVal = NEWID()
    EXEC #DemoBadVariant @MyVar = @MyVal

    PRINT 'Test int'
    DECLARE @MyVal2 int
    SET @MyVal2 = 123
    EXEC #DemoBadVariant @MyVar = @MyVal2

    PRINT 'Test char'
    DECLARE @MyVal3 varchar(100)
    SET @MyVal3 = '123'
    EXEC #DemoBadVariant @MyVar = @MyVal3

    PRINT 'Test uniqueidentifier again, after char test'
    DECLARE @MyVal4 uniqueidentifier
    SET @MyVal4 = NEWID()
    EXEC #DemoBadVariant @MyVar = @MyVal4

    PRINT 'Test int again, after char test'
    DECLARE @MyVal5 int
    SET @MyVal5= 123
    EXEC #DemoBadVariant @MyVar = @MyVal5

  • This is interesting, only the first execution of the test code fails after changing the COMPATIBILITY_LEVEL, consequent executions work fine.
    😎  

    The errors produced are:

    Test uniqueidentifier
    Msg 529, Level 16, State 3, Procedure #DemoBadVariant, Line 14 [Batch Start Line 44]
    Explicit conversion from data type uniqueidentifier to int is not allowed.
    Test int
    Msg 529, Level 16, State 3, Procedure #DemoBadVariant, Line 14 [Batch Start Line 44]
    Explicit conversion from data type int to uniqueidentifier is not allowed.

    More interesting is the fact that when adding WITH RECOMPILE, all executions fail on the higher COMPATIBILITY_LEVELs.

  • Ah, thanks.  That helps,  So to refine the description of the problem:

    The compile of the stored procedure fails if an int or uniqueidentifier is passed in.  The compile succeeds if a string or NULL is passed in.  Once compiled, the execution of the procedure succeeds even with an int or uniqueidentifier.

    So...is this expected behavior, or a bug in SQL?

    Doing some more testing, adding OPTION (OPTIMIZE FOR (@MyVar = '')) or OPTION (OPTIMIZE FOR (@MyVar = NULL)) to the INSERT statement works around the issue:  this allows the query optimizer to compile the query for one of the working data types (even if WITH RECOMPILE is used).

    Turning off parameter sniffing does not seem to help, nor does setting Query Optimizer Fixes on.

    But ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = ON does work around the problem:  no compile errors, regardless of types passed in. 

    This is not the first time I have encountered the query optimizer and/or the cardinality estimator doing a poorer job in SQL2016.  This may be a different discussion, but I am concerned that SQL 2016 is causing more problems with my application. :angry:

  • Yeah, this definitely seems like a bug (and still exists in SQL 2017 CU9). Using OPTIMIZE FOR UNKNOWN on the query seems to resolve the issue (and probably makes sense given the query in question) but it's probably worth filing it as a bug in the new optimizer.

  • David Rueter - Saturday, July 21, 2018 11:44 AM

    Ah, thanks.  That helps,  So to refine the description of the problem:

    The compile of the stored procedure fails if an int or uniqueidentifier is passed in.  The compile succeeds if a string or NULL is passed in.  Once compiled, the execution of the procedure succeeds even with an int or uniqueidentifier.

    So...is this expected behavior, or a bug in SQL?

    Doing some more testing, adding OPTION (OPTIMIZE FOR (@MyVar = '')) or OPTION (OPTIMIZE FOR (@MyVar = NULL)) to the INSERT statement works around the issue:  this allows the query optimizer to compile the query for one of the working data types (even if WITH RECOMPILE is used).

    Turning off parameter sniffing does not seem to help, nor does setting Query Optimizer Fixes on.

    But ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = ON does work around the problem:  no compile errors, regardless of types passed in. 

    This is not the first time I have encountered the query optimizer and/or the cardinality estimator doing a poorer job in SQL2016.  This may be a different discussion, but I am concerned that SQL 2016 is causing more problems with my application. :angry:

    Just thinking that maybe a better solution is to get away from the SQL Variant data type by making a change to the stored procedure so that it can accept two parameters, and it defaults to NULL values for both.   One is an int, and the other a unique identifier, and the logic in the stored procedure determines which one has been supplied, and fails itself if both are NULL, and possibly also if both are supplied (or prioritizes one over the other), and then proceeds accordingly.   It gets you away from the SQL Variant data type, and I can't imagine that messes much with the performance.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thanks, Steve.  In this case, the scenario that I described is a bit of an over-simplification.  The actual use is deep inside a user-configurable enterprise-class application with automatically-generated CRUD procedures and views...and I think the single variant field approach makes sense.

    One feature of this system at runtime is being able to specify whether in a resultset you want reference fields (i.e. lookups / foreign key references) returned as native int IDs or as GUID IDs ...and have the resultset remain identical, except for the value of these foreign key references (as they might be either int or uniqueidentifier depending on what the caller requested).  This turns out to be handy, as it promotes code reuse:  internal calls to these procedures always use native int IDs, but calls from user code that exposes data in a browser (for example) can use the same routines but with the int foreign key references swapped out for uniqueidentifiers to help secure things (as guids are not guessable, whereas sequential int IDs are much more guessable).

    It really wouldn't be good for the system to have to return both an int ID and a GUID ID for every lookup column in every resultset:  only one style of ID is ever needed by the caller.  So the generated GET procedures have a parameter @UseReferenceGUIDS  that can be set to 1, or left to the default of 0 to receive int IDs in the reference field columns.  The generated UPDATE procedures automatically sniff out the style of ID from the reference column...performing a lookup of the int ID from the GUID via a UDF in a CASE statement if needed.

    In other words, a design decision calls for a single column be returned for each reference field, but requires that the column may contain either int or uniqueidentifier values as specified by the caller.  That being the case, updates require sniffing out the reference field value's base type and looking up and casting on the fly as needed is required.

    (There are additional considerations that went into this design as well, such as the need to use synthetic keys due to the dynamically user-configured SQL objects, along with the desire for this key to be used for the clustered index, and uniqueidentifiers being a poor choice for clustered indexes.  It's all pretty well thought out...but does end up needing to make use of some "edgy" features of SQL such as the sql_variant type to gracefully implement everything.)

    But thanks for the feedback.  It makes me smile:  "Doc, it hurts when I move my arm like this."  "Oh, OK.  Don't move your arm like that."  🙂

  • David Rueter - Monday, July 23, 2018 1:40 PM

    Thanks, Steve.  In this case, the scenario that I described is a bit of an over-simplification.  The actual use is deep inside a user-configurable enterprise-class application with automatically-generated CRUD procedures and views...and I think the single variant field approach makes sense.

    One feature of this system at runtime is being able to specify whether in a resultset you want reference fields (i.e. lookups / foreign key references) returned as native int IDs or as GUID IDs ...and have the resultset remain identical, except for the value of these foreign key references (as they might be either int or uniqueidentifier depending on what the caller requested).  This turns out to be handy, as it promotes code reuse:  internal calls to these procedures always use native int IDs, but calls from user code that exposes data in a browser (for example) can use the same routines but with the int foreign key references swapped out for uniqueidentifiers to help secure things (as guids are not guessable, whereas sequential int IDs are much more guessable).

    It really wouldn't be good for the system to have to return both an int ID and a GUID ID for every lookup column in every resultset:  only one style of ID is ever needed by the caller.  So the generated GET procedures have a parameter @UseReferenceGUIDS  that can be set to 1, or left to the default of 0 to receive int IDs in the reference field columns.  The generated UPDATE procedures automatically sniff out the style of ID from the reference column...performing a lookup of the int ID from the GUID via a UDF in a CASE statement if needed.

    In other words, a design decision calls for a single column be returned for each reference field, but requires that the column may contain either int or uniqueidentifier values as specified by the caller.  That being the case, updates require sniffing out the reference field value's base type and looking up and casting on the fly as needed is required.

    (There are additional considerations that went into this design as well, such as the need to use synthetic keys due to the dynamically user-configured SQL objects, along with the desire for this key to be used for the clustered index, and uniqueidentifiers being a poor choice for clustered indexes.  It's all pretty well thought out...but does end up needing to make use of some "edgy" features of SQL such as the sql_variant type to gracefully implement everything.)

    But thanks for the feedback.  It makes me smile:  "Doc, it hurts when I move my arm like this."  "Oh, OK.  Don't move your arm like that."  🙂

    Interesting design choice.   Seems to me that someone somewhere insisted on cutting corners because they were unwilling to do the more difficult piece of work of thinking a tad more and making better choices for how to keep things as seamless as is practical.   Because such things can NOT always be seamless, there are ALWAYS going to be times when the coding is more challenging, but just using code re-use as a justification for a bad idea just smacks of either lazy coding or unrealistic deadlines, or both.   Commercial software companies are almost always guilty of one or the other, and sometimes both.

    I got a good laugh out of your doctor comment... it's so applicable here....

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • You probably fixed it already but if not, i'd probably go for the "delayed" compilation by hardcoding the dynamic sql depending on variant_property. Worked on in 2016 machine, but has downsides of slowness if procedure is called many times

    CREATE PROCEDURE dbo.#DemoBadVariant
    @pVariant SQL_VARIANT
    AS
    BEGIN

        DECLARE @x UNIQUEIDENTIFIER
        ,    @y INT
        ,    @SQL NVARCHAR(MAX)

        IF SQL_VARIANT_PROPERTY(@pVariant,'BaseType') = 'int'
        BEGIN
            SET @SQL = 'SELECT @y = CAST(@pVariant AS INT)'
            EXEC SP_EXECUTESQL @SQL, N'@y INT OUTPUT, @pVariant SQL_VARIANT', @y = @y OUTPUT, @pVariant = @pVariant
        END
        ELSE IF SQL_VARIANT_PROPERTY(@pVariant,'BaseType') = 'uniqueidentifier'
        BEGIN
            SET @SQL = 'SELECT @x = CAST(@pVariant AS UNIQUEIDENTIFIER)'
            EXEC SP_EXECUTESQL @SQL, N'@x UNIQUEIDENTIFIER OUTPUT, @pVariant SQL_VARIANT', @x = @x OUTPUT, @pVariant = @pVariant
        END
        ELSE
        BEGIN
            SET @SQL = 'SELECT @y = CASE WHEN ISNUMERIC(CAST(@pVariant AS varchar(8000))) = 1 THEN CAST(@pVariant AS int) ELSE NULL END'
            EXEC SP_EXECUTESQL @SQL, N'@y INT OUTPUT, @pVariant SQL_VARIANT', @y = @y OUTPUT, @pVariant = @pVariant
            
        END

        select @x, @y
        -- The actual logic here...
    END

    GO

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply