Query works static, but does not dynamically

  • I cannot seem to figure this one out, so I have taken to a forum for assistance.

    I have an SSIS package that puts together a series of queries to check all text fields in all tables of a database for a set of "invalid" characters. I know the root query works as we have been running it manually for quite a while. However, now that I have changed it to build dynamically, I cannot figure out why it does not work.

    Here is the code. The EXEC (@sSQL) returns 3 records. It just so happens that these 3 records are the only ones in the database with a '?' in the field. The very bottom statement returns 0 records. I cannot figure out what the difference is. When I do PRINT (@sSQL), it looks fine to me, except that @Pattern is fully printed.

    Any ideas?

    DECLARE @Pattern NVARCHAR(MAX)

    DECLARE @sSQL NVARCHAR(MAX)

    DECLARE @1 nvarchar(max)

    DECLARE @2 nvarchar(max)

    DECLARE @loop int

    SET @Pattern = '%['

    SET @Pattern = @Pattern + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR (5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31)

    SET @loop = 55296

    WHILE @loop < 57344

    BEGIN

    SET @Pattern = @Pattern + NCHAR (@loop)

    SET @loop = @loop + 1

    END

    SET @Pattern = @Pattern + NCHAR(65534) + NCHAR(65535)

    SET @Pattern = @Pattern + ']%'

    set @1 = 'SELECT * FROM [datacreator].[M_EventPOI] WHERE PATINDEX('

    set @2 = ', CONVERT(NVARCHAR(MAX), LTRIM(RTRIM(Alternate_Name))) COLLATE Latin1_General_BIN) <> 0'

    SET @sSQL = @1 + '''' + @Pattern + '''' + @2

    --3 Rows Returns; all 3 have '?'

    EXEC (@ssql)

    --FOR COMPARISON. 0 rows return (which is correct)

    select * FROM [datacreator].[M_EventPOI] WHERE PATINDEX(@Pattern, CONVERT(NVARCHAR(MAX), LTRIM(RTRIM(Alternate_Name))) COLLATE Latin1_General_BIN) <> 0

  • Basically, I think what is happeneing is that the dynamic SQL is evaluating '?' as one of my invalid characters, even though it's not part of my list. '?' is CHAR(63), which I am not including in my @Pattern. What I can't figure out is why/how it is getting in there for the dynamic version but not for the static.

  • I'm guessing what is happening is that Latin1 has only ascii 0 to 255, not extended characters. So, the double-bytes are being interpreted as single bytes, one of which is a "?".

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • The script works fine and properly if I do not build a dynamic SQL input, but it also doesn't actually parse out the @Pattern prior to the PATINDEX in the static query like it does in the dynamic SQL.

    I did go ahead and try other collations, though, with the same result. Any idea what collation would accept the unicode extended characters?

  • Sorry, I have no idea; that's not my area of expertise.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Two questions and a clarification:

    Clarification: What are you using to build the @sql string? Is this completely encapsulated in an OLEDB or Exectute SQL task, or are you assembling a local variable in a script component first?

    Question 1: Have you performed a profiler to confirm the query being sent to the SQL Server is the one you expect?

    Question 2: Have you directly confirmed that your character is definately a CHR(63) via ASCII(SUBSTRING())?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • The code fails because the first character in the Pattern sequence is NCHAR(0). Any usage of the string will recognise this as the end of the string, truncating the string at this point. The solution is to start at NCHAR(1) instead.

    😎

    For demonstration, run first this code

    DECLARE @Pattern NVARCHAR(MAX)

    DECLARE @sSQL NVARCHAR(MAX)

    SET @Pattern = '%['

    SET @Pattern = @Pattern + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR (5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31)

    SELECT @Pattern

    Return value

    %[

    And then this

    DECLARE @Pattern NVARCHAR(MAX)

    DECLARE @sSQL NVARCHAR(MAX)

    SET @Pattern = '%['

    SET @Pattern = @Pattern + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR (5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31)

    SELECT CONVERT(VARBINARY(MAX),@Pattern,3)

    Return value

    0x25005B000000010002000300040005000600070008000B000C000E000F0010001100120013001400150016001700180019001A001B001C001D001E001F00

    This is probably way too obvious for one to notice at the first glance;-)

  • Unfortunately, converting to varbinary didn't work and neither did removing the nchar(0) start.

    Also unfortunately, I do not have permission to run profiler in this particular case. It's a severely locked down server.

    I have confirmed that the '?' I am seeing is truly CHAR(63) by checking the ASCII value of the characters in the field.

    Now, to explain further, I have this code in a Function on another server. It works flawlessly there. I pass in a NVARCHAR(MAX) string and it runs through all the chars and uses STUFF to take them out. It spits out the cleaned string.

    I can't simply put the working function on this server and handle it the same way. I am not allowed to create anything more than temp tables. As a result, I am using SSIS. Here is the run-down on what I am doing:

    First thing I do is identify ALL char, varchar, nchar, and nvarchar columns in the TABLE_SCHEMA.

    I have a ForEach loop to iterate over all of these identified tables/columns. In the loop I do a couple of things:

    First, I create a SQL statement using VBScript. It outputs two variables; SQL1 and SQL2. They are:

    --sTable and sColumn are pulled in from the ForEach loop

    SQL1 = "SELECT * FROM " & sTable & " WHERE PATINDEX("

    SQL2 = ", (" & sColumn & ") COLLATE Latin1_General_BIN) <> 0"

    Next, I run the SQL code from my first post. Here is the unaltered code in that SQL Task:

    DECLARE @Counter INTEGER

    DECLARE @Pattern NVARCHAR(MAX)

    DECLARE @loop int

    DECLARE @sSQL NVARCHAR(MAX)

    SET @Pattern = '%['

    SET @Pattern = @Pattern + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR (5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31)

    SET @loop = 55296

    WHILE @loop < 57344

    BEGIN

    SET @Pattern = @Pattern + NCHAR (@loop)

    SET @loop = @loop + 1

    END

    SET @Pattern = @Pattern + NCHAR(65534) + NCHAR(65535)

    SET @Pattern = @Pattern + ']%'

    SET @sSQL = ? + '''' + @Pattern + '''' + ?

    EXEC (@sSQL)

    After this step, I do some logging and such that isn't important to this question, so I'll stop it there.

    Now, the reason why I am using the VBScript to create SQL1 and SQL2 is because I cannot include @Pattern in dynamic SQL. (Must declare the scalar variable "@Pattern".) I have to use dynamic SQL because I am changing the table and/or column each time as I iterate over my list.

    I am open to any other suggestions or different approaches. I've been scratching my head over this for a couple of weeks now.

  • Have you tried using an SC collation? There's info here. Your ? might be the first of a surrogate pair.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/29/2014)


    Have you tried using an SC collation? There's info here. Your ? might be the first of a surrogate pair.

    I thought supplementary char collations were not in 2k8? When I check fn_helpcollations, I don't see any with _SC. Am I missing something?

  • A little more testing and it looks like it might really be a collation issue. If I comment out the following, the return comes back as I expect.

    SET @loop = 55296

    WHILE @loop < 57344

    --WHILE @loop < 55296

    BEGIN

    SET @Pattern = @Pattern + NCHAR (@loop)

    SET @loop = @loop + 1

    END

    SET @Pattern = @Pattern + NCHAR(65534) + NCHAR(65535)

    I am guessing that the database is not set up to handle SC or these higher valued chars in the first place. If this is the case, then I shouldn't need to scan for them because they can't exist in the first place. May end up being a non-issue.

    Any additional input?

  • Solved:

    DECLARE @Pattern NVARCHAR(MAX)

    DECLARE @sSQL NVARCHAR(MAX)

    DECLARE @1 nvarchar(max)

    DECLARE @2 nvarchar(max)

    DECLARE @loop int

    SET @Pattern = '%['

    SET @Pattern = @Pattern + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR (5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31)

    SET @loop = 55296

    WHILE @loop < 57344

    BEGIN

    SET @Pattern = @Pattern + NCHAR(@loop)

    SET @loop = @loop + 1

    END

    SET @Pattern = @Pattern + NCHAR(65534) + NCHAR(65535) + ']%'

    ------------------------------------------------------------------------------------------------------------------------------------

    -- Fails: note PATINDEX('

    SET @sSQL = '

    SELECT * FROM (SELECT Alternate_Name = N''Sample?String'') d

    CROSS APPLY (SELECT pos = PATINDEX('+ '''' + @Pattern + '''' + ', CONVERT(NVARCHAR(MAX), LTRIM(RTRIM(Alternate_Name)))

    COLLATE Latin1_General_BIN)) x

    WHERE pos <> 0'

    EXEC (@ssql)

    -----------------------------------------------------------------------------------------------------------------------------------

    -- Works: note PATINDEX(N'

    SET @sSQL = '

    SELECT * FROM (SELECT Alternate_Name = N''Sample?String'') d

    CROSS APPLY (SELECT pos = PATINDEX(N'+ '''' + @Pattern + '''' + ', CONVERT(NVARCHAR(MAX), LTRIM(RTRIM(Alternate_Name)))

    COLLATE Latin1_General_BIN)) x

    WHERE pos <> 0'

    EXEC (@ssql)

    ----------------------------------------------------------------------------------------------------------------------------------

    -- funny - they look the same

    ;WITH

    E1 AS (SELECT * FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)),

    E2 (n) AS (SELECT 0 FROM E1 a, E1 b),

    iTally (n) AS (SELECT TOP (LEN(@ssql)) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM E2 a, E2 b)

    SELECT

    [Char] = SUBSTRING(@ssql,n,1),

    [UNICODECode] = UNICODE(SUBSTRING(@ssql,n,1)),

    [ASCIICode] = ASCII(SUBSTRING(@ssql,n,1))

    FROM iTally

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Wow, thanks Chris! I can't believe I forgot to N' the string. That single oversight fixed the entire routine.

    Good information to know on the CHAR/Unicode/ASCII differences. I've never had to work much above 255's until now, so much appreciated and a good learning experience.

  • joseph.robinson83 (8/29/2014)


    Wow, thanks Chris! I can't believe I forgot to N' the string. That single oversight fixed the entire routine.

    Good information to know on the CHAR/Unicode/ASCII differences. I've never had to work much above 255's until now, so much appreciated and a good learning experience.

    Thanks Joseph - it was a learning experience for me too!

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply