modifying a 'replace' function to use Lower and Upper

  • I have a function that I'm using that is working with one glitch. It is designed to change non-English characters to their closest equivalent. Problem is it is always returning Upper Case. Any ideas:

    FUNCTION:

    USE [NBCC_Search]
    GO
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO

    Create Function [dbo].[fn_ReplaceSpecialCharacters] (@TempString AS VARCHAR(100))
    RETURNS VARCHAR(100)
    AS
    BEGIN

    RETURN replace(replace(
    replace(replace(@TempString
    ,'Ș', 'S')
    ,'ș', 's')
    ,'Ț', 'T')
    ,'ț', 't')

    END

    GO

    I call the above within a select:

    USE [BCC_Search]
    GO

    SELECT [ContactId],[FirstName],[LastName]
         ,[dbo].[fn_ReplaceSpecialCharacters]([LastName]) AS 'NewName'
    FROM [dbo].[Professional] where country = 'Romania' order by [LastName] desc
    GO

    But the word "Buchareșt" comes back as "BuchareSt"

  • your database is case-insensitive, so ?,S,? and s  (four different versions of S)are found in the last replace.

    you want to modify the proc to use binary collation instead
    ALTER Function [dbo].[fn_ReplaceSpecialCharacters] (@TempString AS VARCHAR(100))
    RETURNS VARCHAR(100)
    AS
    BEGIN

    RETURN replace(replace(
    replace(replace(@TempString COLLATE Latin1_General_BIN
    ,'?', 'S')
    ,'?', 's')
    ,'?', 'T')
    ,'?', 't')

    END

    GO

    SELECT [dbo].[fn_ReplaceSpecialCharacters]('ÀAlbèert ËEîin?s?têeiìn ÌInstìitúutëe')

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Hmm... Problem persists... even the test comes back as ÀAlbèert ËEîinSsStêeiìn ÌInstìitúutëe

  • Converting those values to varchar is likely what's getting you here. Those characters get turned into a CHAR(63) in VARCHAR.

    Change the parameter and output types to NVARCHAR, and see if that helps.

    Cheers!

  • Still same problem with nvarchar

  • try this:
    this is explicitly finding the ascii value, and replacing them; it's a modification of a strip high asciii that i had made previously.

    note there are some characters you have to decide to fiddle with yourself.,is the times indicator an x or an asterisk? is theta a zero or an "oh"? 
    use master;
    GO
    IF OBJECT_ID('[dbo].[StripReplaceASCII_ITVF]') IS NOT NULL
    DROP FUNCTION [dbo].[StripReplaceASCII_ITVF]
    GO
    --#################################################################################################
    --StripReplace_ASCII_ITVF removes specific characters from a string.
    --usage for ITVF requires cross apply or cross join
    --ie SELECT TOP 100 fn.CleanedText,MRNumber From EDLogDetail CROSS APPLY dbo.StripReplace_ASCII_ITVF(MRNumber) fn WHERE MRNumber IS NOT NULL
    --#################################################################################################
    CREATE FUNCTION dbo.StripReplaceASCII_ITVF(@OriginalText VARCHAR(8000))
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN

    WITH
        E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
        E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
        E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
        Tally(N) AS
        (
            SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
        ),HighAsciiReplaced
    AS
    (
        
    select STUFF(
    (    
        SELECT
      CASE
      --WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) IN (9,10,13)   THEN SUBSTRING(@OriginalText,Tally.N,1) --tab,lf,cr
      --WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 32 AND 127 THEN SUBSTRING(@OriginalText,Tally.N,1)
      --WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) <= 32 THEN ''
      --WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) >= 128 THEN ''
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 192 AND 197 THEN 'A' -- ÀÁÂÃÄÅ -->A 192 to 197
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 198 THEN 'AE'     --Æ  -->AE 198
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 199 THEN 'C'     --Ç  -->C 199
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 200 AND 203 THEN 'E' --ÈÉÊË -->E 200 to 203
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 204 AND 207 THEN 'I'     --ÌÍÎÏ -->I 204 to 207
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 208 THEN 'D'     --Р -->D 208
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 209 THEN 'N'     --Ñ  -->N 209
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 210 AND 214 THEN 'O' --ÒÓÔÕÖ -->O 210 to 214
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 215 THEN '*'     --×  -->x or * 215 (multiplication operator)
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 216 THEN '0'     --Ø  -->0 216 zero or capital "o"?
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 217 AND 220 THEN 'U' --ÙÚÛÜ -->U 217 to 220
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 221 THEN 'Y'     --Ý  -->Y 221
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 222 THEN ' '     --Þ  --> 222
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 223 THEN ' '     --ß  --> 223
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 224 AND 229 THEN 'a' --àáâãäå -->a 224 to 229
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 230 THEN 'ae'     --æ  -->ae 230
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 231 THEN 'c'     --ç  -->c 231
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 232 AND 235 THEN 'e' --èéêë -->e 232 to 235
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 236 AND 239 THEN 'i' --ìíîï -->i 236 to 239
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 240 THEN 'o'     --ð  --> 240
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 240 THEN 'n'     --ñ  -->n 241
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 242 AND 246 THEN 'o' --òóôõö -->o 242 to 246
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 247 THEN '/'     --÷  -->/ 247 (division operator)
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 248 THEN '0'     --ø  -->0 248 zero or capital "o"?
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 249 AND 252 THEN 'u' --ùúûü -->u 249 to 252
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 253 THEN 'y'     --ý  -->y 253
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 254 THEN ' '     --þ  --> 254
      WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) = 255 THEN 'y'     --ÿ  -->y 255
      ELSE SUBSTRING(@OriginalText,Tally.N,1)
    END
        FROM Tally
    WHERE Tally.N <= len(@OriginalText) -- added by ajb
        FOR XML PATH(''), TYPE).value('.','varchar(max)'),1,1,'') as CleanedText
    )
    SELECT CleanedText FROM HighAsciiReplaced
    GO

    SELECT * FROM [dbo].[StripReplaceASCII_ITVF]('ÀAlbèert ËEîinstêeiìn ÌInstìitúutëe')

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Exactly what changes did you make? 

    It works fine for me.

    If you're using Lowell's example, be sure that the string literal is N'ÀAlbèert ËEîin?s?têeiìn ÌInstìitúutëe'.

    Otherwise that starting string is VARCHAR.

    Cheers!

  • Also note that the input parameter to the function should be declared as NVARCHAR

  • DesNorton - Thursday, February 2, 2017 12:04 PM

    Also note that the input parameter to the function should be declared as NVARCHAR

    Indeed, as I mentioned in my first post 😀

  • i think we are just dealing with high ascii here, char 126-255, so switching to nvarchar shouldn't have an impact; i think it's just collation  and case sensitivity.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell - Thursday, February 2, 2017 12:10 PM

    i think we are just dealing with high ascii here, char 126-255, so switching to nvarchar shouldn't have an impact; i think it's just collation  and case sensitivity.

    I've tested it, and switching to NVARCHAR does. 

    Maybe there's something else going on too, but making it all unicode from start to finish fixes it.

    Cheers!

  • Here's what I used:

    ALTER Function [dbo].[fn_ReplaceSpecialCharacters] (@TempString AS NVARCHAR(100))
    RETURNS NVARCHAR(100)
    AS
    BEGIN
    RETURN @TempString COLLATE Latin1_General_BIN
    END
    GO

    USE [NBCC_Search]
    GO
    SELECT [ContactId]
      ,[FirstName]
      ,[LastName]
         ,[dbo].[fn_ReplaceSpecialCharacters]([LastName]) AS 'NewName'
    FROM [dbo].[Professional] where country = 'Romania' order by [LastName] desc
    GO --Does not work

    SELECT [dbo].[fn_ReplaceSpecialCharacters](N'?') --Works
    SELECT [dbo].[fn_ReplaceSpecialCharacters](N'ÀAlbèert ËEîin?s?têeiìn ÌInstìitúutëe')  --Works

  • Lowell - Thursday, February 2, 2017 12:10 PM

    i think we are just dealing with high ascii here, char 126-255, so switching to nvarchar shouldn't have an impact; i think it's just collation  and case sensitivity.

    I'm not so sure about that.  Although, it might be the font of the forum that is messing with my code.

    This code returns values of 536, 537, 538, 539

    SELECT
      UNICODE(N'Ș')
    , UNICODE(N'È™')
    , UNICODE(N'Èš')
    , UNICODE(N'È›')

  • briancampbellmcad - Thursday, February 2, 2017 12:17 PM

    Here's what I used:

    ALTER Function [dbo].[fn_ReplaceSpecialCharacters] (@TempString AS NVARCHAR(100))
    RETURNS NVARCHAR(100)
    AS
    BEGIN
    RETURN @TempString COLLATE Latin1_General_BIN
    END
    GO

    USE [NBCC_Search]
    GO
    SELECT [ContactId]
      ,[FirstName]
      ,[LastName]
         ,[dbo].[fn_ReplaceSpecialCharacters]([LastName]) AS 'NewName'
    FROM [dbo].[Professional] where country = 'Romania' order by [LastName] desc
    GO --Does not work

    SELECT [dbo].[fn_ReplaceSpecialCharacters](N'È™') --Works
    SELECT [dbo].[fn_ReplaceSpecialCharacters](N'ÀAlbèert ËEîinȘsÈ™têeiìn ÃŒInstìitúutëe')  --Works

    Thanks for the update!

    Could you post the DDL for the Professional table?

    Cheers!

  • DesNorton - Thursday, February 2, 2017 12:18 PM

    Lowell - Thursday, February 2, 2017 12:10 PM

    i think we are just dealing with high ascii here, char 126-255, so switching to nvarchar shouldn't have an impact; i think it's just collation  and case sensitivity.

    I'm not so sure about that.  Although, it might be the font of the forum that is messing with my code.

    This code returns values of 536, 537, 538, 539

    SELECT
      UNICODE(N'Ș')
    , UNICODE(N'È™')
    , UNICODE(N'Èš')
    , UNICODE(N'È›')

    ahh i see that now that you pasted the unicode;
    yeah, with that in the mix, i agree it's gotta be nvarchar. i see that  now, thanks Jacob.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply