get only the alphabets from string

  • i have productname column in my table and the query i need to get only the alphabets.

    column contains the special characters (!@#$%^&*():'"{}[]\|?-+=,) and numbers(0-9), space

    sample data

    Foodhold USA,mlc.

    Beverage Partners Worldwide (North canada)......

    Bread World 8

    my expected output will be

    FoodholdUSAmlc

    BeveragePartnersWorldwide(Northcanada)

    BreadWorld

    SELECT productname,

    SUBSTRING(Name,1,ISNULL(NULLIF(PATINDEX('%[^A-Za-z.''0-9]%',LTRIM(RTRIM(productname))),0)-1,LEN(productname))) AS noSpecials

    FROM Manufacturer

    but it is not working. can anyone please show me sample query..

  • You can try this:

    WITH SampleData (col) AS

    (

    SELECT 'Foodhold USA,mlc.'

    UNION ALL SELECT 'Beverage Partners Worldwide (North canada)......'

    UNION ALL SELECT 'Bread World 8'''

    ),

    Tally (n) AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns

    )

    SELECT col

    ,NewCol=

    (

    SELECT SUBSTRING(col, n, 1)

    FROM SampleData b

    CROSS APPLY (SELECT n FROM Tally WHERE n BETWEEN 1 AND LEN(col)) c

    WHERE PATINDEX('%[a-zA-Z]%', SUBSTRING(col, n, 1)) <> 0 AND b.col = a.col

    ORDER BY n

    FOR XML PATH('')

    )

    FROM SampleData a

    Note that you may have issues if the same "dirty" string appears in more than one row.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thank you for the reply and will post the output soon.

  • Are the numbers and special characters present at the last of the string only?

    _______________________________________________________________
    To get quick answer follow this link:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • I am done with writing function to remove the special characters and number from string.

    Thanks everyone who responded to my question.

  • born2achieve (9/13/2013)


    I am done with writing function to remove the special characters and number from string.

    Thanks everyone who responded to my question.

    Nice. Let's see it, please. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here is the code i tried and works fine the only minute draback of using this function take few milliseconds delay than using the sql query. but writing function uses in many by calling it

    CREATE FUNCTION [dbo].[fn_GetAlphabetsOnly](@input VARCHAR(50))

    RETURNS VARCHAR(1000)

    AS

    BEGIN

    WHILE PATINDEX('%[^a-z]%',@input) > 0

    SET @input = STUFF(@input,PATINDEX('%[^a-z]%',@input),1,'')

    RETURN @input

    END

  • born2achieve (9/13/2013)


    Here is the code i tried and works fine the only minute draback of using this function take few milliseconds delay than using the sql query. but writing function uses in many by calling it

    CREATE FUNCTION [dbo].[fn_GetAlphabetsOnly](@input VARCHAR(50))

    RETURNS VARCHAR(1000)

    AS

    BEGIN

    WHILE PATINDEX('%[^a-z]%',@input) > 0

    SET @input = STUFF(@input,PATINDEX('%[^a-z]%',@input),1,'')

    RETURN @input

    END

    My recommendation is to remove the "limits" you have built into it to give it a broader range of utility. For example, change @Input to VARCHAR(8000) and the RETURNS to VARCHAR(8000).

    While (no pun intended) this is a good fast "memory only" method for doing such a thing, it's still a scalar function. I'll see if I can come up with an "iSF" later tonight with the understanding that it'll need to be used in a CROSS APPLY rather than in the SELECT list.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dear Jeff,

    Thanks for your time on this and it will be more helpful if you have much better choice than my approach as it is function.

  • Sorry... deleted this post. I made a testing error. I'll be back.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok, here we go. First, we need some test data. I normally test functions with a million rows of data to ensure scalability so here’s a million row test table. I used the three strings you provided and duplicated them over a million rows.

    --=======================================================================================

    -- Create a Million row test table. This is not a part of the solution.

    -- We're just building a test table here.

    --=======================================================================================

    --===== Conditionally drop the test table to make reruns easier in SSMS

    IF OBJECT_ID('tempdb..#TestTable') IS NOT NULL

    DROP TABLE #TestTable

    ;

    --===== Create and populate the test table on-the-fly.

    SELECT TOP 1000000

    d.SomeString

    INTO #TestTable

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    CROSS APPLY

    (

    SELECT 'Foodhold USA,mlc.' UNION ALL

    SELECT 'Beverage Partners Worldwide (North canada)......' UNION ALL

    SELECT 'Bread World 8'

    ) d (SomeString)

    ;

    The original function couldn’t handle anything over 50 characters because of the VARCHAR(50) input. I changed that in the original function but that’s all. Here’s the new copy for the original function.

    ALTER FUNCTION dbo.fn_GetAlphabetsOnly

    (@input VARCHAR(8000))

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    WHILE PATINDEX('%[^a-z]%',@input) > 0

    SET @input = STUFF(@input,PATINDEX('%[^a-z]%',@input ),1,'');

    RETURN @input;

    END

    ;

    I also saw a major performance flaw (other than the WHILE loop ;-)) in the original function. As tight as it is, it executes PATINDEX twice for each row. I created a new function that has longer code but the execution time is cut about a third because it only executes PATINDEX once per row. Here’s that new function.

    CREATE FUNCTION dbo.fn_GetAlphabetsOnlyModified

    (@input VARCHAR(8000))

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @P INT

    SELECT @P = PATINDEX('%[^a-z]%',@input)

    WHILE @P > 0

    SELECT @input = STUFF(@input,@P,1,''),

    @P = PATINDEX('%[^a-z]%',@input)

    RETURN @input;

    END

    ;

    Of course, there are other useful changes such as adding COLLATE LATIN1_GENERAL_BIN and checking for both upper/lower case letters to overcome the slower default collations people may be using but really wanted to test the original code, so I didn’t make those changes. I also didn’t add those changes into the set-based function below because I wanted to compare apples to apples.

    CREATE FUNCTION dbo.GetAlphaOnly

    /*****************************************************************************************

    Purpose:

    Given a string of up to 8000 characters, return only the alphabetic characters.

    Usage:

    SELECT ao.AlphaOnly

    FROM dbo.SomeTable st

    CROSS APPLY GetAlphaOnly(st.SomeStringColumn) ao

    ;

    Alternate Usage:

    SELECT AlphaOnly = (SELECT AlphaOnly FROM dbo.GetAlpha_XML(st.SomeString))

    FROM dbo.SomeTable st

    ;

    ------------------------------------------------------------------------------------------

    Revision History:

    Rev 00 - 14 Sep 2013 - Jeff Moden

    - Rework an existing function for performance purposes.

    *****************************************************************************************/

    (@pString VARCHAR(8000))

    RETURNS TABLE AS

    RETURN

    WITH

    --===== Generate up to 10,000 rows ("En" indicates the power of 10 produced)

    E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),

    E4(N) AS (SELECT 1 FROM E1 a,E1 b,E1 c,E1 d),

    cTally AS (SELECT TOP (LEN(@pString)) N = ROW_NUMBER() OVER (ORDER BY N) FROM E4)

    SELECT AlphaOnly = CAST(

    (

    SELECT SUBSTRING(@pString,t.N,1)

    FROM cTally t

    WHERE SUBSTRING(@pString,t.N,1) LIKE '[a-z]'

    ORDER BY N

    FOR XML PATH('')

    )

    AS VARCHAR(8000))

    ;

    Of course, before we do any performance testing, we have to make sure that they all work correctly.

    --===== Test to make sure that all 3 functions work correctly

    SELECT Top 3

    SomeString

    ,dbo.fn_GetAlphabetsOnly(SomeString)

    FROM #TestTable

    ;

    SELECT Top 3

    SomeString

    ,dbo.fn_GetAlphabetsOnlyModified (SomeString)

    FROM #TestTable

    ;

    SELECT Top 3

    SomeString

    ,AlphaOnly

    FROM #TestTable

    CROSS APPLY dbo.GetAlphaOnly(SomeString)

    ;

    Results from above:

    SomeString

    ------------------------------------------------ -------------------------------------

    Foodhold USA, mlc. FoodholdUSAmlc

    Beverage Partners Worldwide (North canada)...... BeveragePartnersWorldwideNorthcanada

    Bread World 8 BreadWorld

    (3 row(s) affected)

    SomeString

    ------------------------------------------------ -------------------------------------

    Foodhold USA, mlc. FoodholdUSAmlc

    Beverage Partners Worldwide (North canada)...... BeveragePartnersWorldwideNorthcanada

    Bread World 8 BreadWorld

    (3 row(s) affected)

    SomeString AlphaOnly

    ------------------------------------------------ -------------------------------------

    Foodhold USA, mlc. FoodholdUSAmlc

    Beverage Partners Worldwide (North canada)...... BeveragePartnersWorldwideNorthcanada

    Bread World 8 BreadWorld

    (3 row(s) affected)

    We’re ready to rumble for performance testing. We can’t use SET STATISTICS TIME, IO ON for these tests because it makes it look like Scalar UDF’s are about 6 or 7 times slower than they really are (see this link on that subject http://www.sqlservercentral.com/articles/T-SQL/91724/ ). Instead, I used SQL Profiler, which will also show the RBAR nature of the Scalar UDFs in terms of ROWCOUNTs generated internally.

    With SQL Profiler all setup to capture the tests (I did it “Client Side” because it’s easy and I’m on a private machine), here’s the code I ran to perform the tests. Each query runs over the whole table. The @Bitbucket variable is used as a target for the results to take display times out of the picture so that we’re mostly measuring the performance of the functions rather than the display times. I also run each query 3 times just to see if there are any performance anomalies.

    --===== dbo.fn_GetAlphabetsOnly ==========================================================

    DECLARE @Bitbucket VARCHAR(8000);

    SELECT @Bitbucket = dbo.fn_GetAlphabetsOnly(SomeString)

    FROM #TestTable

    ;

    GO 3

    --===== dbo.GetAlphabetsOnlyModified ====================================================

    DECLARE @Bitbucket VARCHAR(8000);

    SELECT @Bitbucket = dbo.fn_GetAlphabetsOnlyModified (SomeString)

    FROM #TestTable

    ;

    GO 3

    --===== dbo.GetAlphaOnly =================================================================

    DECLARE @Bitbucket VARCHAR(8000);

    SELECT @Bitbucket = AlphaOnly

    FROM #TestTable

    CROSS APPLY dbo.GetAlphaOnly(SomeString)

    ;

    GO 3

    And, here’s the SQL Profiler results. As you can see, as tight as that nice little WHILE loop is, it never stood a chance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply