Identify Hidden Characters

  • I have a 10 million line report that the key field is in a row above all fields below; the number of fields below varies.  I use the stuff command to populate the field "MyFilledKey".  The key field varies in length from 8 to 20 characters.  In example 1, the output is length 20 and I cannot concatenate strings using the new value.  In example 2 the output is as expected but I had to manipulate the original value with spaces (which I'd prefer not to do).

    Question:  I'm assuming some type of hidden character is present in example 1 but cant figure it out; in example 3 I search for characters used.  Assume the issue has something to do with binary?

     

    --EXAMPLE 1 - Output has hidden spaces

    --data for example
    IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test
    CREATE TABLE #test (RecordId int, MyKey varchar(20))
    INSERT INTO #test
    SELECT 10000001, '1234567890' UNION ALL
    SELECT 10000002, '' UNION ALL
    SELECT 10000003, '' UNION ALL
    SELECT 10000004, '' UNION ALL
    SELECT 10000005, '12345678' UNION ALL
    SELECT 10000006, '' UNION ALL
    SELECT 10000007, '' UNION ALL
    SELECT 10000008, ''

    --length is not 20
    SELECT LEN(MyKey) FROM #test

    --associate key to all respective fields
    IF OBJECT_ID('tempdb..#testmore') IS NOT NULL DROP TABLE #testmore
    SELECT RecordId, MyKey,
    STUFF(MAX(CAST(RecordId AS binary(5)) + CAST(NULLIF(CAST(MyKey as varchar(199)), '') as binary(20))) OVER(ORDER BY RecordId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ), 1, 5, NULL) as MyFilledKey
    INTO #testmore FROM #test

    SELECT MyFilledKey, LEN(MyFilledKey), MyFilledKey + 'xxx' as Example FROM #testmore

    --EXAMPLE 2 - Output is OK

    --data for example
    IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test
    CREATE TABLE #test (RecordId int, MyKey varchar(20))
    INSERT INTO #test
    SELECT 10000001, '1234567890 ' UNION ALL
    SELECT 10000002, '' UNION ALL
    SELECT 10000003, '' UNION ALL
    SELECT 10000004, '' UNION ALL
    SELECT 10000005, '12345678 ' UNION ALL
    SELECT 10000006, '' UNION ALL
    SELECT 10000007, '' UNION ALL
    SELECT 10000008, ''

    --length is not 20
    SELECT LEN(MyKey) FROM #test

    --associate key to all respective fields
    IF OBJECT_ID('tempdb..#testmore') IS NOT NULL DROP TABLE #testmore
    SELECT RecordId, MyKey,
    STUFF(MAX(CAST(RecordId AS binary(5)) + CAST(NULLIF(CAST(MyKey as varchar(199)), '') as binary(20))) OVER(ORDER BY RecordId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ), 1, 5, NULL) as MyFilledKey
    INTO #testmore FROM #test

    SELECT MyFilledKey, LEN(MyFilledKey), MyFilledKey + 'xxx' as Example FROM #testmore

    --EXAMPLE 3 (find character)

    DECLARE @tablename VARCHAR(1000) ='#test'
    DECLARE @columnname VARCHAR(100)='MyKey'
    DECLARE @counter INT = 0
    DECLARE @sql VARCHAR(MAX)

    IF OBJECT_ID('tempdb..#FindHidden') IS NOT NULL DROP TABLE #FindHidden
    CREATE TABLE #FindHidden (MyKey varchar(99), CharacterSet varchar(99), LocationChar varchar(99))

    WHILE @counter <=255
    BEGIN

    SET @sql=

    'INSERT INTO #FindHidden SELECT TOP 10 '+@columnname+','+CAST(@counter AS VARCHAR(3))+' as CharacterSet, CHARINDEX(CHAR('+CAST(@counter AS VARCHAR(3))+'),'+@columnname+') as LocationOfChar
    FROM '+@tablename+'
    WHERE CHARINDEX(CHAR('+CAST(@counter AS VARCHAR(3))+'),'+@columnname+') <> 0'

    --PRINT (@sql)
    EXEC (@sql)
    SET @counter = @counter + 1
    END

    SELECT * FROM #FindHidden ORDER BY CharacterSet

     

     

     

  • I'm not sure what you are trying to do. (There is no STUFF command. STUFF is just a string function that removes or adds characters in a string.) But you do this:

    MAX(CAST(RecordId AS binary(5)) + CAST(NULLIF(CAST(MyKey as varchar(199)), '') as binary(20)))

    That is you are constructing a binary value which has a fixed length of 25 bytes.

    Then you apply STUFF, which is a function that I don't use myself, but I believe 1, 5 means that you delete the RecordID. Anyway, you have 20 bytes left.

    Look at the output from this SELECT:

    SELECT MyFilledKey, LEN(MyFilledKey), MyFilledKey + 'xxx' as Example, convert(varbinary(30), MyFilledKey) FROM #testmore

    You can see all the trailing 0x00 bytes.

     

     

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • What you posted helped, I could not identify what was filling the field.  What I'd wanted to do was get rid trailing 0x00 bytes.  Looks like this will work.  Thanks!

     

    SELECT MyFilledKey, LEN(MyFilledKey) as MyLen, MyFilledKey + 'xxx' as Example, convert(varbinary(30), MyFilledKey) as Examp,
    replace(MyFilledKey COLLATE Latin1_General_BIN + nchar(0x00), nchar(0x00) COLLATE Latin1_General_BIN , '') as Replaced
    INTO #Z FROM #testmore

    SELECT LEN(Replaced) FROM #Z

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply