Find a String in a String in SQL Server

  • rick.fisher

    Grasshopper

    Points: 10

    You can also just do:

    substring(customernotes,

    charindex('PO:',customernotes + 'PO:')+3,

    charindex(' ',customernotes + ' PO: ',charindex('PO:',customernotes + ' PO:') )-charindex('PO:',customernotes)-3)

    This works where no PO is present also.

    Or to also handle the case of "test PO: 201530444. New test"

    rtrim(ltrim(

    substring(customernotes,

    charindex('PO:',customernotes + 'PO:')+3,

    PATINDEX('%[^ 0-9]%',SUBSTRING(customernotes ,charindex('PO:',customernotes + 'PO:')+4,1000) + '.' )

    )

    ))

  • Phil Factor

    SSCoach

    Points: 19913

    Great article, Steve, because it illustrates the way that one has to go about finding a solution. However, because you don't like PatIndex, it all gets a bit complicated. Why not just do this? (it checks for anything starting with PO: followed by two numbers. If it finds anything it returns all the numbers following until it finds something that isn't a number: otherwise it returns a blank string ''). OK it may not be entirely intuitive but once one understands the technique, I guarantee that it is useful where you have to pull out a single substring that varies in length.

    [font="Courier New"]

    SELECT customerid, LEFT(TopOfPO, PATINDEX('%[0-9][^0-9]%', TopOfPO))

    FROM

      (SELECT

         SUBSTRING(

          customernotes+'     ',

          PATINDEX('%PO:[0-9][0-9]%', customerNotes+'PO:99')+3,

          100) AS topOfPO,

       customerID

       FROM customers)f(TopOfPO, customerID)[/font]

    This technique is called "Top 'n Tailing". You start by nicking off the start of the string, and then finish by lopping off everything after the end of the substring. The only complication is in dealing with the problem of not finding the substring. You just ensure that it always finds something by adding a string on the end! It is easier to play about with the example than to describe!

    Best wishes,
    Phil Factor
    Simple Talk

  • jcuribe

    Grasshopper

    Points: 13

    my personal favorite is to relly on RegExp, with some CLR you can getit into sql server. What I do is add it to master database. this way its available to all databases.

    master.dbo.RegExMatch(d,'(?<=PO:)(\d{7,10})(?=\s)')

    basiclay capture 7-10 numbers that are preceded by PO: and followed by a space or end.

  • Jeff Moden

    SSC Guru

    Points: 994934

    jcuribe (3/11/2016)


    my personal favorite is to relly on RegExp, with some CLR you can getit into sql server. What I do is add it to master database. this way its available to all databases.

    master.dbo.RegExMatch(d,'(?<=PO:)(\d{7,10})(?=\s)')

    basiclay capture 7-10 numbers that are preceded by PO: and followed by a space or end.

    My personal favorite is, given the same accuracy, which ever is the fastest and least resource intensive. That's usually not REGEX as called from SQL SERVER even when using a CLR. There have been a ton of threads on this very subject where it has been found that if CHARINDEX or PATINDEX can be used in a straight forward manner (and sometimes in a complicated manner), it beats and sometimes blows the doors off of Regex calls.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Brad Schulz

    Old Hand

    Points: 395

    Steve Jones - SSC Editor (3/10/2016)


    Thanks for the catch. I updated the math to get the PO.

    I wonder which of these works better in complex queries. It's worth some performance testing to see. I hadn't thought about cross apply here, but it's an interesting approach.

    In a case like this, where CROSS APPLY is only used to introduce new columns, there is no performance hit at all.

    Let's say you had a table of comma-delimited list of one or more integers:

    create table #t

    (

    ID int identity(1,1)

    ,ListOfNums varchar(50)

    )

    insert #t

    values ('279,37,972,15,175')

    ,('17,72')

    ,('672,52,19,23')

    ,('153,798,266,52,29')

    ,('77,349,14')

    select * from #t

    /*

    ID ListOfNums

    -- -----------------

    1 279,37,972,15,175

    2 17,72

    3 672,52,19,23

    4 153,798,266,52,29

    5 77,349,14

    */

    And you have to find all the rows where the 4th number in the list is less than 50 and sort the result based on the value of the 3rd number.

    Before CROSS APPLY, you'd have to do it like this (eye roll):

    select ID

    ,ListOfNums

    from #t

    where substring(ListOfNums+',,,,',charindex(',',ListOfNums+',,,,',

    charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,')+1)+1)+1,

    (charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,',

    charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,')+1)+1)+1)-

    charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,',

    charindex(',',ListOfNums+',,,,')+1)+1))-1)

    < 50

    order by substring(ListOfNums+',,,,',charindex(',',ListOfNums+',,,,',

    charindex(',',ListOfNums+',,,,')+1)+1,(charindex(',',ListOfNums+',,,,',

    charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,')+1)+1)-

    charindex(',',ListOfNums+',,,,',charindex(',',ListOfNums+',,,,')+1))-1)

    /*

    ID ListOfNums

    -- -------------

    2 17,72

    5 77,349,14

    3 672,52,19,23

    1 279,37,972,15

    */

    But with CROSS APPLY, you could do it like this... and the query plan is EXACTLY THE SAME as the one above:

    select ID

    ,ListOfNums

    from #t

    cross apply (select WorkString=ListOfNums+',,,,') F_Str

    cross apply (select p1=charindex(',',WorkString)) F_P1

    cross apply (select p2=charindex(',',WorkString,p1+1)) F_P2

    cross apply (select p3=charindex(',',WorkString,p2+1)) F_P3

    cross apply (select p4=charindex(',',WorkString,p3+1)) F_P4

    cross apply (select Num3=convert(int,substring(WorkString,p2+1,p3-p2-1))

    ,Num4=convert(int,substring(WorkString,p3+1,p4-p3-1))) F_Nums

    where Num4<50

    order by Num3

    /*

    ID ListOfNums

    -- -------------

    2 17,72

    5 77,349,14

    3 672,52,19,23

    1 279,37,972,15

    */

  • phurst

    SSC Veteran

    Points: 235

    There's a sneaky trick I like to use in this scenario that can shorten the code, and I feel make it more readable.

    'PO' = SUBSTRING(CustomerNotes, CHARINDEX('PO:', CustomerNotes)+3, COALESCE(NULLIF(CHARINDEX(' ', CustomerNotes, CHARINDEX('PO:', CustomerNotes)+3),0),LEN(customernotes)) - CHARINDEX('PO:', CustomerNotes) - 2

    The case is replaced with a nested COALESCE NULLIF combo. If the CHARINDEX returns 0 NULLIF converts it to a NULL and COALESCE returns the value from LEN.

    This doesn't seem to have any performance impact one way or the other.

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 716327

    Great other solutions.

    It would be good to showcase a comparison of different methods at scale. I'll try to do one, using Jeff's test data piece to scale up test data and see how things perform. Unless someone wants to beat me to it.

  • TommyFroy

    Newbie

    Points: 9

    You're all right... This an all-too-common issue. This piqued my curiosity though and started playing with the script so pull out the PO number from the string where the Purchase Order Number followed "PO:", "PO", "PO: ", or "PO " - since the last thing user's are is consistent.

    Here's what I came up with.

    Any recommendations on condensing this a little further?

    SELECT CustomerID

    ,REPLACE(

    REPLACE(

    REPLACE(

    SUBSTRING(CustomerNotes

    ,CHARINDEX('PO', CustomerNotes)

    ,CASE CHARINDEX(' ', CustomerNotes, CHARINDEX('PO', CustomerNotes)+5)

    WHEN 0

    THEN LEN(CustomerNotes) - CHARINDEX('PO',CustomerNotes)+1

    ELSE CHARINDEX(' ', CustomerNotes, CHARINDEX('PO', CustomerNotes)) - CHARINDEX('PO', CustomerNotes)

    END)

    ,' ','')

    ,':','')

    ,'PO','') AS PONumber

    FROM dbo.Customers

    WHERE CustomerNotes LIKE '%PO%'

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Manic Star (3/10/2016)


    This is why i loathe long free-flowing 'note' fields where people embed data they then expect reports or queries to pull from in a consistent way.

    Good example is credit card #'s in note fields. Even using regular expression processing its difficult to find all types of patterns that represent the way the user entered the CC#'s in the 'note'.

    If you need to report or find PO data, then it should go into a field in a table called something like 'PO Number', NOT in a note.

    Note fields should be for misc notes, not data fields.

    I totally agree.

  • quagmired

    Old Hand

    Points: 324

    My personal favorite is, given the same accuracy, which ever is the fastest and least resource intensive. That's usually not REGEX as called from SQL SERVER even when using a CLR. There have been a ton of threads on this very subject where it has been found that if CHARINDEX or PATINDEX can be used in a straight forward manner (and sometimes in a complicated manner), it beats and sometimes blows the doors off of Regex calls.

    All the more reason to have a nimble regex library (PCRE) running natively within SQL server with a instance-wide cache of compiled patterns. The cumbersome TSQL solutions for even relatively simple patterns are a joke when a short, sweet, regex will suffice. It's 2016. Why don't we have this?

  • Jeff Moden

    SSC Guru

    Points: 994934

    quagmired (3/17/2016)


    My personal favorite is, given the same accuracy, which ever is the fastest and least resource intensive. That's usually not REGEX as called from SQL SERVER even when using a CLR. There have been a ton of threads on this very subject where it has been found that if CHARINDEX or PATINDEX can be used in a straight forward manner (and sometimes in a complicated manner), it beats and sometimes blows the doors off of Regex calls.

    All the more reason to have a nimble regex library (PCRE) running natively within SQL server with a instance-wide cache of compiled patterns. The cumbersome TSQL solutions for even relatively simple patterns are a joke when a short, sweet, regex will suffice. It's 2016. Why don't we have this?

    So what happens when something doesn't fit a "compiled pattern"?

    Mid string searches in SQL suck. Regex has it's own problems. The reason either of them exist is to be able to interrogate improperly formed data. The best thing (although rarely the easiest thing) to do would be to have a "Moden Style" pork chop dinner with whomever the data provider is and get them to fix their junk. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • quagmired

    Old Hand

    Points: 324

    Jeff Moden (3/17/2016)


    quagmired (3/17/2016)


    My personal favorite is, given the same accuracy, which ever is the fastest and least resource intensive. That's usually not REGEX as called from SQL SERVER even when using a CLR. There have been a ton of threads on this very subject where it has been found that if CHARINDEX or PATINDEX can be used in a straight forward manner (and sometimes in a complicated manner), it beats and sometimes blows the doors off of Regex calls.

    All the more reason to have a nimble regex library (PCRE) running natively within SQL server with a instance-wide cache of compiled patterns. The cumbersome TSQL solutions for even relatively simple patterns are a joke when a short, sweet, regex will suffice. It's 2016. Why don't we have this?

    So what happens when something doesn't fit a "compiled pattern"?

    Mid string searches in SQL suck. Regex has it's own problems. The reason either of them exist is to be able to interrogate improperly formed data. The best thing (although rarely the easiest thing) to do would be to have a "Moden Style" pork chop dinner with whomever the data provider is and get them to fix their junk. 😉

    Not looking to turn this into a drag 'em out, but...

    Most runtimes maintain a cache of compiled patterns. If a new one comes along, compile it, and keep it around for the next time. I have no idea, but I'd be willing to bet that SQL Server does this with LIKE patterns.

    Regex or any variety of pattern matching exists for a whole lot more than mid-string searches. Think data validation. I absolutely agree that storing structured data out of place in a blob is a piss-poor practice. But pattern matching still has its place in SQL, especially in analytics, and providing a subset of full blown regex is just putting up unnecessary hoops and hurdles.

    Name a problem with regex that is not a problem with what you get in standard SQL pattern matching.

    Parting thought: how would you match a 1 followed by an arbitrary number of 0s followed by a 1 with T-SQL?

  • Jeff Moden

    SSC Guru

    Points: 994934

    quagmired (3/17/2016)


    Parting thought: how would you match a 1 followed by an arbitrary number of 0s followed by a 1 with T-SQL?

    Starting thought.... Have you ever actually had to do that anywhere? If so, what was it for?

    Also, there's a whole thread where people spent the time trying to justify RegEx from SQL and got their heads handed to them by coded proof. I'll see if I can find it...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • quagmired

    Old Hand

    Points: 324

    Jeff Moden (3/18/2016)


    quagmired (3/17/2016)


    Parting thought: how would you match a 1 followed by an arbitrary number of 0s followed by a 1 with T-SQL?

    Starting thought.... Have you ever actually had to do that anywhere? If so, what was it for?

    Also, there's a whole thread where people spent the time trying to justify RegEx from SQL and got their heads handed to them by coded proof. I'll see if I can find it...

    Fine, how about a string with 'PO:' preceded by a non-alphanumeric character or nothing at all, followed by an arbitrary number of spaces followed by up to ten decimal digits, terminating, or followed by a non decimal character. When have I ever had to match 10*1? Maybe in automata class. Why does it matter? It's a common class of problems that the SQL community, apparently, does not want to own, and the waste of effort and bloated TSQL code that ensues is really very much a joke, hence this article/forum topic.

    Don't pay any attention to the fact that LIKE pattern matching already available in SQL is dinosaur regex.

    Handed heads: if that coded "proof" involves a native engine with caching and features that prevent runaway patterns, (non)?greedy, possessive, anchored, etc (again, PCRE), then perhaps that's proof, but I really doubt it. Looking forward to a link.

  • Jeff Moden

    SSC Guru

    Points: 994934

    Jeff Moden (3/18/2016)


    quagmired (3/17/2016)


    Parting thought: how would you match a 1 followed by an arbitrary number of 0s followed by a 1 with T-SQL?

    Starting thought.... Have you ever actually had to do that anywhere? If so, what was it for?

    Also, there's a whole thread where people spent the time trying to justify RegEx from SQL and got their heads handed to them by coded proof. I'll see if I can find it...

    Found the discussion... and RegEx got its ears folded back pretty badly.

    http://www.sqlservercentral.com/Forums/Topic1296195-60-1.aspx

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 47 total)

You must be logged in to reply to this topic. Login to reply