Please help to get, Alphabet with underscore i.e special charecter

  • EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Driver_Licensefrom.pdf

    In the above string, i would like to get as below like alphabet and '_', i tried  to use the function fn_GetAlphabetsOnly but getting only alphabet.

    EMA_MRNMM_PMS_PID_OR_Driver_Licensefrom

  • We don't have your fn_GetAlphabetsOnly function -- that's a user-defined function.

     

    The following returns EMA_T__MRNMM_PMSPAT_PID_OR_Dr:

    DECLARE @string VARCHAR(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Dr'
    SELECT @string,REPLACE(@string,'%[^0-9]%',''), REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@string,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9','');

    Note: Your sample results don't seem to reflect the fact that eliminating numeric digits could result in two or more consecutive underscore characters. If you're actually trying to consolidate those, that's another issue.

     

     

    See also https://www.sqlservercentral.com/scripts/patreplace8k

  • Just to be sure. You want to?

    1. Locate the first numeric character
    2. Strip everything from the first numeric character up to the next underscore
    3. Repeat from step 1 using the current underscore as the starting position
    4. Return the stripped-down string
  • SET ANSI_NULLS ON
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE FUNCTION [dbo].[fn_GetAlphabetsAndUnderscoresOnly]
    (
    @string VARCHAR(8000)
    )
    RETURNS VARCHAR(8000)
    AS
    BEGIN
    DECLARE @invalidCharLocation SMALLINT
    SET @string = LEFT(@string, CHARINDEX('.', @string + '.') - 1)
    SET @invalidCharLocation = PATINDEX('%[^A-Za-z_]%', @string)
    WHILE @invalidCharLocation > 0
    BEGIN
    SET @string = STUFF(@string, @invalidCharLocation, 1, '')
    SET @invalidCharLocation = PATINDEX('%[^A-Za-z_]%', @string)
    END --WHILE
    SET @string = @string
    RETURN @string
    END --FUNCTION

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Instead of a WHILE loop you could try a numbers table or tally function approach

    declare @string varchar(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Dr';

    select string_agg(v.chr, '') within group (order by fn.n) az_only
    from dbo.fnTally(1, len(@string)) fn
    cross apply (values (substring(@string, fn.n, 1))) v(chr)
    where v.chr like '[A-Za-z_]';

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • With the understanding that I've not done any deep analysis on any of the code posted and going only on what I've seen in the past, the nested REPLACEs that RatBak posted should would a treat for form, function, and performance.  There IS a performance problem with REPLACE when it comes to certain collations but adding a binary collation to the code will work a real treat there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you, So much!!

  • Thank you so much, Scott!! Really appreciate you help..

  • Jeff Moden wrote:

    With the understanding that I've not done any deep analysis on any of the code posted and going only on what I've seen in the past, the nested REPLACEs that RatBak posted should would a treat for form, function, and performance.  There IS a performance problem with REPLACE when it comes to certain collations but adding a binary collation to the code will work a real treat there.

    Yes, had it produced the desired outcome. 🙂

  • I thought later of one possible correction to removing the file extension, just in case two or more periods (.) appear in the value.

    SET ANSI_NULLS ON
    SET QUOTED_IDENTIFIER ON
    GO
    ALTER FUNCTION [dbo].[fn_GetAlphabetsAndUnderscoresOnly]
    (
    @string VARCHAR(8000)
    )
    RETURNS VARCHAR(8000)
    AS
    BEGIN
    DECLARE @invalidCharLocation SMALLINT
    /* strip file extension from name */
    SET @string = LEFT(@string, LEN(@string) - CHARINDEX('.', REVERSE('.' + @string)))
    SET @invalidCharLocation = PATINDEX('%[^A-Za-z_]%', @string)
    WHILE @invalidCharLocation > 0
    BEGIN
    SET @string = STUFF(@string, @invalidCharLocation, 1, '')
    SET @invalidCharLocation = PATINDEX('%[^A-Za-z_]%', @string)
    END --WHILE
    SET @string = @string
    RETURN @string
    END --FUNCTION

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • kaj wrote:

    Jeff Moden wrote:

    With the understanding that I've not done any deep analysis on any of the code posted and going only on what I've seen in the past, the nested REPLACEs that RatBak posted should would a treat for form, function, and performance.  There IS a performance problem with REPLACE when it comes to certain collations but adding a binary collation to the code will work a real treat there.

    Yes, had it produced the desired outcome. 🙂

    Agreed.  In fact, none of the code posted above produces the actual outcome posted by the OP.  Not even the selected "Correct" answer.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @dba.k ,

    Look again. None of the code posted actually produces the desired output that you posted.  "Must look eye".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Afaik this actually produces the desired output.  Should work with SQL 2014.  It uses the ordinal splitter DelimitedSplit8K_Lead.  The query: splits the input string on '_' underscore, finds the offset location of the first integer within the split string, substrings to extract the lead characters, and removes the file extension.  Then it concatenates the calculated strings 'string_wo_ext' separated by underscores using the FOR XML trick.

    declare @string varchar(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Driver_Licensefrom.pdf';

    select stuff((select '_' + calc.string_wo_ext
    from dbo.DelimitedSplit8K_Lead(@string, '_') ds
    cross apply (values (patindex('%[0-9]%', ds.Item))) frst(tndx)
    cross apply (values (iif(frst.tndx=0, Item, substring(ds.Item, 1, frst.tndx-1)))) string(string_to_agg)
    cross apply (values (left(string.string_to_agg, len(string.string_to_agg) - charindex('.', reverse(string.string_to_agg))))) calc(string_wo_ext)
    where Item like '[A-Za-z_]%'
    order by ds.ItemNumber
    for xml path('')), 1, 1, '') string_val;
    CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
    --===== Define I/O parameters
    (@pString VARCHAR(8000), @pDelimiter CHAR(1))
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN
    WITH E1(N) AS (
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
    ), --10E+1 or 10 rows
    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
    cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
    -- for both a performance gain and prevention of accidental "overruns"
    SELECT 0 UNION ALL
    SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
    ),
    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
    SELECT t.N+1
    FROM cteTally t
    WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
    )
    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
    Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
    FROM cteStart s
    ;

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Steve Collins wrote:

    Afaik this actually produces the desired output.  Should work with SQL 2014.  It uses the ordinal splitter DelimitedSplit8K_Lead.  The query: splits the input string on '_' underscore, finds the offset location of the first integer within the split string, substrings to extract the lead characters, and removes the file extension.  Then it concatenates the calculated strings 'string_wo_ext' separated by underscores using the FOR XML trick.

    declare @string varchar(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT000001287_PID15235307_OR_Driver_Licensefrom.pdf';

    select stuff((select '_' + calc.string_wo_ext
    from dbo.DelimitedSplit8K_Lead(@string, '_') ds
    cross apply (values (patindex('%[0-9]%', ds.Item))) frst(tndx)
    cross apply (values (iif(frst.tndx=0, Item, substring(ds.Item, 1, frst.tndx-1)))) string(string_to_agg)
    cross apply (values (left(string.string_to_agg, len(string.string_to_agg) - charindex('.', reverse(string.string_to_agg))))) calc(string_wo_ext)
    where Item like '[A-Za-z_]%'
    order by ds.ItemNumber
    for xml path('')), 1, 1, '') string_val;
    CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
    --===== Define I/O parameters
    (@pString VARCHAR(8000), @pDelimiter CHAR(1))
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN
    WITH E1(N) AS (
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
    ), --10E+1 or 10 rows
    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
    cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
    -- for both a performance gain and prevention of accidental "overruns"
    SELECT 0 UNION ALL
    SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
    ),
    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
    SELECT t.N+1
    FROM cteTally t
    WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
    )
    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
    Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
    FROM cteStart s
    ;

    A couple of potential issues: if there's an additional period (.) in the file name and/or an underscore in the extension.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    A couple of potential issues: if there's an additional period (.) in the file name and/or an underscore in the extension.

    At least it produces the correct output when using the given data.  😉  And, it doesn't use a While loop inside of a scalar function, which will prevent parallelism.

    Also, if you RUN at the following, you'll find that an extra period in the file name doesn't seem to cause a problem but the extra underscore in the extension is a problem.  Here's the example with Steve's code.

    declare @string varchar(255) = 'EMA_20210526T211254_0000_MRNMM0000001240_PMS110460PAT0000.01287_PID15235307_OR_Driver_Licensefrom.p_df';

    select stuff((select '_' + calc.string_wo_ext
    from dbo.DelimitedSplit8K_Lead(@string, '_') ds
    cross apply (values (patindex('%[0-9]%', ds.Item))) frst(tndx)
    cross apply (values (iif(frst.tndx=0, Item, substring(ds.Item, 1, frst.tndx-1)))) string(string_to_agg)
    cross apply (values (left(string.string_to_agg, len(string.string_to_agg) - charindex('.', reverse(string.string_to_agg))))) calc(string_wo_ext)
    where Item like '[A-Za-z_]%'
    order by ds.ItemNumber
    for xml path('')), 1, 1, '') string_val;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply