RTRIM is not working as expected

  • The weird think is that OP's untrimmable symbol has ASCII =32.

    All untrimmable spaces demonstrated so far have other ASCII codes reported by SQL.

  • The Unicode space-characters 32 and 0x2000 to 0x2003 return an ASCII value of 32.

    RTRIM only trims Unicode character 32.

    If a string ends with NCHAR(0x2000) RTIM will not remove the last character, but the ASCII value of the last character shows 32.

    You can test this by using UNICODE() instead of ASCII() to show the Unicode value of the character.

  • Sql 2008

    select s, ASCII(s),UNICODE(s), dump =cast(s as varbinary(10))

    from (values

    (NCHAR(0x0020)),(NCHAR(0x0120)),(NCHAR(0x0220)),(NCHAR(0x0320))

    ,(NCHAR(0x2000)),(NCHAR(0x2001)),(NCHAR(0x2002)),(NCHAR(0x2003))

    ) t(s)

    s(No column name)(No column name)dump

    32320x2000

    G712880x2001

    ?635440x2002

    ?638000x2003

    6381920x0020

    6381930x0120

    6381940x0220

    6381950x0320

  • Louis Hillebrand (4/22/2015)


    In Unicode there are more Space-Characters than ASCII(32)

    You can use UNICODE(Char) to get the Unicode value for the space-character.

    I wasn't aware of UNICODE as a function, but it certainly sounds like you've hit the nail on the head. Just need to await the OP now.


    --EDIT--

    Just wanted to add my own little confirmation that it's probably a UNICODE character as Louis points out: -

    SELECT a.X AS [Type],

    ASCII(a.N) AS [Ascii Value],

    UNICODE(a.N) AS [Unicode Value],

    ASCII(RTRIM(a.N)) AS [Ascii Value after RTRIM],

    UNICODE(RTRIM(a.N)) AS [Unicode Value after RTRIM]

    FROM ( VALUES ( 'Not a space', NCHAR(0x2002)), ( 'A space', CHAR(32)) ) a ( X, N );

    Type Ascii Value Unicode Value Ascii Value after RTRIM Unicode Value after RTRIM

    ----------- ----------- ------------- ----------------------- -------------------------

    Not a space 32 8194 32 8194

    A space 32 32 NULL NULL

    So the Unicode value NCHAR(0x2002) will return ASCII 32 before and after an RTRIM (I'm sure there are other values that do the same, that was just the one I pulled out of the hat).


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • That is the result of the UNICODE,

    I would like to take some time to dig deeper on this and summarize all the information.

    Thanks all of you for the support!

    Paul Hernández
  • SQL 2008R2 Collation SQL_Latin1_General_CP1_CI_AS

    Could also be a difference in regional settings..

    s(No column name)(No column name)dump

    32320x2000

    G712880x2001

    ?635440x2002

    ?638000x2003

    3281920x0020

    3281930x0120

    3281940x0220

    3281950x0320

  • Paul Hernández (4/22/2015)


    That is the result of the UNICODE,

    I would like to take some time to dig deeper on this and summarize all the information.

    Thanks all of you for the support!

    A space that RTRIM can get rid of would have a UNICODE value of 32, so that's your issue 🙂


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • 12288 is the Unicode character 'IDEOGRAPHIC SPACE' (0x3000),

    Added to my list of Unicode-spaces... which I use in my front-end to cleanup any copy-paste strings..

  • NCHAR(0x0020) which is binary dumped as 0x2000 is the only symbol with ASCII =32 and it's trimmed OK. All the rest have other ASCII.

    with tt as (

    select top(256) n=row_number() over (order by (select null))-1

    from sys.all_objects

    ), chars as (

    select bin = cast(256*tt1.n + tt2.n as varbinary(2)) -- ==NCHAR(0x<tt2.n><tt1.n>), big/little-endian stuff

    from tt tt1, tt tt2

    )

    select bin

    , char = cast(bin as nvarchar(1))

    , ascii = ASCII(cast(bin as nvarchar(1)))

    , unc = UNICODE(cast(bin as nvarchar(1)))

    from chars

    --where ASCII(cast(bin as nvarchar(1))) = 32

    order by UNICODE(cast(bin as nvarchar(1)))

  • serg-52 (4/22/2015)


    NCHAR(0x0020) which is binary dumped as 0x2000 is the only symbol with ASCII =32 and it's trimmed OK. All the rest have other ASCII.

    Must be collation based, e.g.

    SELECT CHAR(34) + a.[STRING] + CHAR(34) AS [QUOTED STRING],

    CHAR(34) + a.[TRIMMED STRING] + CHAR(34) AS [QUOTED TRIMMED STRING],

    ASCII(RIGHT(a.[STRING], 1)) AS [ASCII CODE OF LAST CHARACTER],

    ASCII(RIGHT(a.[TRIMMED STRING], 1)) AS [ASCII CODE OF LAST CHARACTER AFTER TRIM],

    UNICODE(RIGHT(a.[STRING], 1)) AS [UNICODE CODE OF LAST CHARACTER],

    UNICODE(RIGHT(a.[TRIMMED STRING], 1)) AS [UNICODE CODE OF LAST CHARACTER AFTER TRIM]

    FROM ( SELECT a.N AS [STRING],

    RTRIM(LTRIM(a.N)) AS [TRIMMED STRING]

    FROM ( VALUES

    ( CONVERT(NVARCHAR(MAX), 0x540068006900730020006900730020006100200073007400720069006E0067000220) COLLATE SQL_Latin1_General_CP1_CI_AI),

    ( CONVERT(NVARCHAR(MAX), 0x540068006900730020006900730020006100200073007400720069006E0067002000) COLLATE SQL_Latin1_General_CP1_CI_AI),

    ( CONVERT(NVARCHAR(MAX), 0x540068006900730020006900730020006100200073007400720069006E0067000030) COLLATE SQL_Latin1_General_CP1_CI_AI) ) a ( N )

    ) a;

    QUOTED STRING QUOTED TRIMMED STRING ASCII CODE OF LAST CHARACTER ASCII CODE OF LAST CHARACTER AFTER TRIM UNICODE CODE OF LAST CHARACTER UNICODE CODE OF LAST CHARACTER AFTER TRIM

    ---------------------- ---------------------- ---------------------------- --------------------------------------- ------------------------------ -----------------------------------------

    "This is a string " "This is a string " 32 32 8194 8194

    "This is a string " "This is a string" 32 103 32 103

    "This is a string " "This is a string " 32 32 12288 12288


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Yes, ASCII _is_ collation dependent.

    select ASCII(s collate Chinese_Simplified_Pinyin_100_BIN)

    ,ASCII(s collate Greek_100_BIN)

    ,ASCII(s collate Latin1_General_100_BIN)

    from (values

    (NCHAR(0x0020)),(NCHAR(0x0120)),(NCHAR(0x0220)),(NCHAR(0x0320))

    ,(NCHAR(0x2000)),(NCHAR(0x2001)),(NCHAR(0x2002)),(NCHAR(0x2003))

    ) t(s)

    I should check it first of all.

  • A Scaler function that trims leading and trailing spaces from a Unicode string.

    dbo.Numbers is a Tally-table.

    CREATE FUNCTION dbo.fn_NTRIM(@T nvarchar(4000))

    RETURNS nvarchar(4000)

    AS

    BEGIN

    DECLARE @r nvarchar(4000);

    WITH cte_Space (UC) AS (

    SELECT UC

    FROM(

    VALUES(0x20),--Space

    (0xA0),--No-Break Space

    (0x180E),--Mongolian Vowel Separator

    (0x2000),--En Quad

    (0x2001),--Em Quad

    (0x2002),--En Space

    (0x2003),--Em Space

    (0x2004),--Three-Per-Em Space

    (0x2005),--Four-Per-Em Space

    (0x2006),--Six-Per-Em Space

    (0x2007),--Figure Space

    (0x2008),--Punctuation Space

    (0x2009),--Thin Space

    (0x200A),--Hair Space

    (0x200B),--Zero Width Space

    (0x202F),--Narrow No-Break Space

    (0x205F),--Medium Mathematical Space

    (0x3000), --Ideographic Space

    (0xFEFF)--Zero Width No-Break Space

    ) AS t(UC)

    )

    SELECT @r = LTRIM(RTRIM(CAST(Data as nvarchar(4000)))) FROM

    (SELECTCASE WHEN SC.UC is null THEN SUBSTRING(@T, num, 1) ELSE NCHAR(32) END

    FROMdbo.Numbers N LEFT OUTER JOIN

    cte_Space SC ON UNICODE(SUBSTRING(@T, num, 1)) = SC.UC

    WHEREN.Num <= LEN(@T)

    FORXML PATH (''), TYPE ) AS Z(Data);

    RETURN @r

    END

    GO

    --############################################################################################################

    DECLARE @T nvarchar(100) = 'this is just a' + NCHAR(160) + 'text' + NCHAR (8195) + 'for test' + NCHAR(8195) + nchar(8196);

    SELECT '>' + RTRIM (@T) + '<', '>' + dbo.fn_NTRIM (@T) + '<';

  • BEWARE the Scalar UDF though!! It has MANY downsides, including voiding the use of parallelism, preventing accurate estimates but the optimizer (with associated bad performance and bad concurrency), often forcing row-by-agonizing-row operations under the covers, etc.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • If you only have alphanumeric values on your column you can use STUFF in combination with PATINDEX to get rid of the junk:

    DECLARE @teststring nvarchar(20)

    SET @teststring = 'test ' + CHAR(13)

    SELECT REVERSE(STUFF(REVERSE(@teststring), 1, PATINDEX('%[a-z0-1]%', REVERSE(@teststring)) - 1, ''))

    PATINDEX identifies the first valid character from the back using REVERSE (either letter or number) and removes the rest.

  • Paul Hernández (4/22/2015)


    Hi Tom, thanks for your answer.

    Please forget my first post, I copied and pasted the query that SSAS issues, therefore contains this weird aliasing.

    Here is a better code:

    SELECT

    DISTINCT

    [dim_vArticles].[ArticleId],

    '"'+rtrim([StyleDescription])+'"' AS StyleDescription

    , ASCII(RIGHT([StyleDescription],1)) ASCIIChar

    FROM [dim].[vArticles] AS [dim_vArticles]

    where ArticleId = '82-4056204122396-129'

    Here is the result:

    I expect the blank space at the end to be removed, but as it can be seen is still there, so the question is, why the RTRIM function does not remove the space at the end of the string?

    Thanks for your help!

    In sql server you can't use alias in the same select, so "ASCII(RIGHT([StyleDescription],1)) ASCIIChar" fetches the original value, not the trimmed value from alias.

    Try this:

    SELECT

    DISTINCT

    [dim_vArticles].[ArticleId],

    '"'+rtrim([StyleDescription])+'"' AS StyleDescription_alias

    , ASCII(RIGHT([StyleDescription_alias],1)) ASCIIChar

    FROM [dim].[vArticles] AS [dim_vArticles]

    where ArticleId = '82-4056204122396-129'

    This will return "invalid column name StyleDescription_alias".

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply