More than 1 alphanumeric chars in a string

  • I have an employee table

    Surname GivenName

    ABC x.yz

    A.BC X.*YZ

    A*.BC xyz

    The query needs to get me the surname and givenname which have more than 1 non alphanumeric characters. In this case the output will be

    a.bc x.*yz

    a*.BC xyz

    I know how to find non alphanumeric chars in a string but not sure if I can how to find the strings with minimum 2 non alphanumeric chars?

  • Now look for one of the SQL Ninja's on the forum to give you something cleaner and better, but this should work for now.with Employee as

    (

    select 'ABC' as Surname, 'x.yz' as GivenName union all

    select 'A.BC' as Surname, 'X.*YZ' as GivenName union all

    select 'A*.BC' as Surname, 'xyz' as GivenName

    )

    SELECT * FROM Employee

    WHERE substring(Surname, patindex('%[^a-zA-Z0-9]%',Surname)+1,len(Surname)+1) like '%[^a-zA-Z0-9]%'

    or substring(GivenName, patindex('%[^a-zA-Z0-9]%',GivenName)+1,len(GivenName)+1) like '%[^a-zA-Z0-9]%'



    Microsoft Certified Master - SQL Server 2008
    Follow me on twitter: @keith_tate

    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • shan-422658 (3/12/2014)


    I have an employee table

    Surname GivenName

    ABC x.yz

    A.BC X.*YZ

    A*.BC xyz

    The query needs to get me the surname and givenname which have more than 1 non alphanumeric characters. In this case the output will be

    a.bc x.*yz

    a*.BC xyz

    I know how to find non alphanumeric chars in a string but not sure if I can how to find the strings with minimum 2 non alphanumeric chars?

    Hi and welcome to the forums. This is a rather strange requirement for an employee table. It would seem to me that you should find any rows that have characters that don't make sense for a name. The only values that make sense in a name are any alphanumeric values (that depends on if you allow accent characters), a space, hyphen and apostrophe.

    If you really need to allow characters like * in your employee name and you want to find any where there are more than 1 of those you could use a nested replace. It looks ugly but it should be really fast.

    with employee as

    (

    select 'ABC' as Surname, 'x.yz' as GivenName union all

    select 'A.BC', 'X.*YZ' union all

    select 'A*.BC', 'xyz'

    )

    , AlphaRemoved as

    (

    select *,

    len(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    Surname, 'A', ''),

    'B', ''),

    'C', ''),

    'D', ''),

    'E', ''),

    'F', ''),

    'G', ''),

    'H', ''),

    'I', ''),

    'J', ''),

    'K', ''),

    'L', ''),

    'M', ''),

    'N', ''),

    'O', ''),

    'P', ''),

    'Q', ''),

    'R', ''),

    'S', ''),

    'T', ''),

    'U', ''),

    'V', ''),

    'W', ''),

    'X', ''),

    'Y', ''),

    'Z', '')

    ) as NonAlphaCountSurname,

    len(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    GivenName, 'A', ''),

    'B', ''),

    'C', ''),

    'D', ''),

    'E', ''),

    'F', ''),

    'G', ''),

    'H', ''),

    'I', ''),

    'J', ''),

    'K', ''),

    'L', ''),

    'M', ''),

    'N', ''),

    'O', ''),

    'P', ''),

    'Q', ''),

    'R', ''),

    'S', ''),

    'T', ''),

    'U', ''),

    'V', ''),

    'W', ''),

    'X', ''),

    'Y', ''),

    'Z', '')

    ) as NonAlphaCountGivenName

    from employee

    )

    select Surname, GivenName

    from AlphaRemoved

    where NonAlphaCountSurname > 1

    or NonAlphaCountGivenName > 1

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • BTW Shan, Sean is one of the SQL Ninjas πŸ˜€



    Microsoft Certified Master - SQL Server 2008
    Follow me on twitter: @keith_tate

    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Keith Tate (3/12/2014)


    BTW Shan, Sean is one of the SQL Ninjas πŸ˜€

    Thanks Keith. I would not consider myself a SQL Ninja by any means but thanks for the compliment. I am truly honored and humbled.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (3/12/2014)


    Keith Tate (3/12/2014)


    BTW Shan, Sean is one of the SQL Ninjas πŸ˜€

    Thanks Keith. I would not consider myself a SQL Ninja by any means but thanks for the compliment. I am truly honored and humbled.

    and well known for his love of a good nested replace πŸ˜›

  • In an effort to evaluate which of the two posted methods would perform better I put together a quick test with 1000 random names. It is kind of nasty to post all the details here but I will certainly put it together if anybody wants to see what I did. I have a couple of random name tables that I used for this. With 1,000 rows the two performed so close that there is no clear winner. When I used a cross join between my two tables (which means we are now hitting 2 million rows) the nested replace has a VERY slight edge on performance. They are both actually quite fast with even 2 million rows.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks for your reply.

    Just a version 2 of the above query…

    Surname GivenName

    ABC x.yz

    ABcx.y.z

    ABC X.*YZ

    A*.BC xyz

    The query needs to get me the surname and givenname which have more than 1 non alphanumeric characters together(one after the other). In this case the output will be

    abc x.*yz

    a*.BC xyz

    How would I find just the names that have 2 non alphanumeric chars together(one after the other)

  • Dohsan (3/12/2014)


    Sean Lange (3/12/2014)


    Keith Tate (3/12/2014)


    BTW Shan, Sean is one of the SQL Ninjas πŸ˜€

    Thanks Keith. I would not consider myself a SQL Ninja by any means but thanks for the compliment. I am truly honored and humbled.

    and well known for his love of a good nested replace πŸ˜›

    πŸ˜› They are certainly ugly to look at but they are also usually super fast.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • shan-422658 (3/12/2014)


    Thanks for your reply.

    Just a version 2 of the above query…

    Surname GivenName

    ABC x.yz

    ABcx.y.z

    ABC X.*YZ

    A*.BC xyz

    The query needs to get me the surname and givenname which have more than 1 non alphanumeric characters together(one after the other). In this case the output will be

    abc x.*yz

    a*.BC xyz

    How would I find just the names that have 2 non alphanumeric chars together(one after the other)

    What are you really after? It seems like you are doing some analysis on your data. Is this something you are going to be doing routinely? Trying to find only those values where there are at least 2 non alphanumeric characters in a row is going to suck rotten tomatoes. You are going to have to look at each and every character in every value for all rows in the entire table. This takes RBAR to a new level. It is like RBAR cross join CBAC (character by agonizing character).

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I might be trying to kill a fly with a cannon but this could be an option. It could be turned into an inLine Table-valued Function if needed to avoid writing everything if this code is needed somewhere else.

    CREATE TABLE #employee(Surname varchar(50), GivenName varchar(50)) ;

    WITH Employee as

    (

    select 'ABC' as Surname, 'x.yz' as GivenName union all

    select 'ABC', 'X.*YZ' union all

    select 'A*.BC', 'xyz' union all

    select 'ABc', 'x.y.z'

    )

    INSERT INTO #employee

    SELECT *

    FROM Employee;

    WITH E1(N) AS(

    SELECT N FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(N)

    ),

    E2(N) AS(

    SELECT a.N FROM E1 a, E1 b

    ),

    E4(N) AS(

    SELECT a.N FROM E2 a, E2 b

    )

    SELECT e.*

    FROM #employee e

    CROSS APPLY( SELECT TOP( SELECT MAX(strLen)

    FROM( VALUES(LEN( e.Surname)), (LEN( e.Surname))) x(strLen))

    ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM E4) Tally(N)

    WHERE (SUBSTRING( e.Surname, N, 1) LIKE '[^0-9a-zA-z]'

    AND SUBSTRING( e.Surname, N + 1, 1) LIKE '[^0-9a-zA-z]')

    OR (SUBSTRING( e.GivenName, N, 1) LIKE '[^0-9a-zA-z]'

    AND SUBSTRING( e.GivenName, N + 1, 1) LIKE '[^0-9a-zA-z]') ;

    GO

    DROP TABLE #employee;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (3/12/2014)


    I might be trying to kill a fly with a sledgehammer but this could be an option. It could be turned into an inLine Table-valued Function if needed to avoid writing everything if this code is needed somewhere else.

    CREATE TABLE #employee(Surname varchar(50), GivenName varchar(50)) ;

    WITH Employee as

    (

    select 'ABC' as Surname, 'x.yz' as GivenName union all

    select 'ABC', 'X.*YZ' union all

    select 'A*.BC', 'xyz' union all

    select 'ABc', 'x.y.z'

    )

    INSERT INTO #employee

    SELECT *

    FROM Employee;

    WITH E1(N) AS(

    SELECT N FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(N)

    ),

    E2(N) AS(

    SELECT a.N FROM E1 a, E1 b

    ),

    E4(N) AS(

    SELECT a.N FROM E2 a, E2 b

    )

    SELECT e.*

    FROM #employee e

    CROSS APPLY( SELECT TOP( SELECT MAX(strLen)

    FROM( VALUES(LEN( e.Surname)), (LEN( e.Surname))) x(strLen))

    ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM E4) Tally(N)

    WHERE (SUBSTRING( e.Surname, N, 1) LIKE '[^0-9a-zA-z]'

    AND SUBSTRING( e.Surname, N + 1, 1) LIKE '[^0-9a-zA-z]')

    OR (SUBSTRING( e.GivenName, N, 1) LIKE '[^0-9a-zA-z]'

    AND SUBSTRING( e.GivenName, N + 1, 1) LIKE '[^0-9a-zA-z]') ;

    GO

    DROP TABLE #employee;

    That is pretty slick Luis.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (3/12/2014)


    That is pretty slick Luis.

    Thanks Sean, it came after my third cup of coffee, I might need more now. πŸ˜€

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Sean Lange (3/12/2014)


    In an effort to evaluate which of the two posted methods would perform better I put together a quick test with 1000 random names. It is kind of nasty to post all the details here but I will certainly put it together if anybody wants to see what I did. I have a couple of random name tables that I used for this. With 1,000 rows the two performed so close that there is no clear winner. When I used a cross join between my two tables (which means we are now hitting 2 million rows) the nested replace has a VERY slight edge on performance. They are both actually quite fast with even 2 million rows.

    Thanks for doing that Sean very interesting results.



    Microsoft Certified Master - SQL Server 2008
    Follow me on twitter: @keith_tate

    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Sean Lange (3/12/2014)


    shan-422658 (3/12/2014)


    Thanks for your reply.

    Just a version 2 of the above query…

    Surname GivenName

    ABC x.yz

    ABcx.y.z

    ABC X.*YZ

    A*.BC xyz

    The query needs to get me the surname and givenname which have more than 1 non alphanumeric characters together(one after the other). In this case the output will be

    abc x.*yz

    a*.BC xyz

    How would I find just the names that have 2 non alphanumeric chars together(one after the other)

    What are you really after? It seems like you are doing some analysis on your data. Is this something you are going to be doing routinely? Trying to find only those values where there are at least 2 non alphanumeric characters in a row is going to suck rotten tomatoes. You are going to have to look at each and every character in every value for all rows in the entire table. This takes RBAR to a new level. It is like RBAR cross join CBAC (character by agonizing character).

    I have to agree with Sean that what you're asking for sounds rather nasty. But then again I like a good challenge:

    with Employee as

    (

    select 'ABC' as Surname, 'x.yz' as GivenName union all

    select 'A.BC' as Surname, 'X.*YZ' as GivenName union all

    select 'A.D.*C' as Surname, 'X.*YZ' as GivenName union all

    select 'A*.BC.*.%.%.%' as Surname, 'xyz' as GivenName

    ),

    Tally (n) AS

    (

    SELECT TOP (SELECT CASE WHEN MAX(LEN(Surname)) > MAX(LEN(GivenName)) THEN MAX(LEN(Surname)) ELSE MAX(LEN(GivenName)) END FROM Employee)

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns

    )

    SELECT a.Surname, a.GivenName

    FROM Employee a

    WHERE EXISTS

    (

    SELECT 1

    FROM Tally

    WHERE n < LEN(Surname) - 1 AND

    PATINDEX('%[^a-zA-Z0-9]%', SUBSTRING(Surname, n, 1)) > 0 AND

    PATINDEX('%[^a-zA-Z0-9]%', SUBSTRING(Surname, n+1, 1)) > 0

    )

    OR EXISTS

    (

    SELECT 1

    FROM Tally

    WHERE n < LEN(GivenName) - 1 AND

    PATINDEX('%[^a-zA-Z0-9]%', SUBSTRING(GivenName, n, 1)) > 0 AND

    PATINDEX('%[^a-zA-Z0-9]%', SUBSTRING(GivenName, n+1, 1)) > 0

    );


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply