First Occurance of the string from text

  • Hi All,

    I have text column .I want to find out first occurance of string based on logic.I defiend Text with examples and also mentioned expected result.I coloured the text in word document,due to some reasons not displaying same here.Attached as image below texts to understand more clear.

    TEXT 1: 'ABNAGENDRACSURENDRADJITHENDRAXNARENDRABVEERNDARAXDRMNDRAXRVINDRABNAGENDRACSURENDRADJITHEN'

    From the above text1, I want to get “AXNARENDR”.

    Based on logic defined below:

    First I have to search for string “A” Then next to ‘A’ it should not be “B” or “C” or “D”.It can be anything other thing these three.Combination of “A” otherthan “B” or “C” or “D”

    In the example text I defined “A”,”X” defined three times .I want to capture few characters from the first occurrence of the string

    i.e AXNARENDR (TEXT1 I defined “A” with 4th occur

    TEXT 2:

    'ABNAGENDRACSURENDRADJITHENDRABNAGENDRACSURENDRADJITHENABNAGENDRACSURENDRADJITHENABNAGENDRACSURENDRADJITHENAYENDGHRABVEERNDARAXDRMNDRABNAGENDRACSURENDRADJITHENAYRVINDR'

    From the above text2, I want to get “AYENDGHR”.

    TEXT 3:

    'ABNAGENDRACSURENDRADJITHENDRABNAGENDRACSURENDRADJITHENABNAGENDRACSURENDRADJITHENABNAGENDRACSURENDRADJITHENABNAGENDRACSURENDRADJITHENDRABNAGENDRACSURENDRADJITHENABNAGENDRACSURENDRADJITHENABNAGENDRACSURENDRADJITHABNAGENDRACSURENDRADJITHENDRABNAGENDRACSURENDRADJITHENABNAGENDRACSURENDRADJITHENABNAGENDRACSURENDRADJITHAZENIVKHRABVEERNDARAXDRMNDRAYRVINDR AZNHKLMN'

    From the above text3, I want to get “AZENIVKHR”.

    Please share the code with result as expected with best approach

  • What did you try?

    Did you try using CHARINDEX?

  • Hi and welcome to SSC. Your post doesn't make any sense at all. What is the logic here and how do you know you want to find that particular string of gibberish from a longer string of gibberish.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • You have at least one rule missing. For example, in

    Text1=

    'ABNAGENDRACSURENDRADJITHENDRAXNARENDRABVEERNDARAXDRMNDRAXRVINDRABNAGENDRACSURENDRADJITHEN'

    The first A not followed by a B, C or D is "AGENDR". Why is that "A" excluded?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • "First A not followed by a B, C or D statement" means "A is present before B, C or D" need to ignore

    i am explaining again with simple text.This is almost same text as mentioned earlier. JUST "VEERNDAR" replaced with "VEERNDR" for getting expected result as i mentioned in the scenario (Its my mistake..i really sorry for this)

    TEXT 1: 'ABNAGENDRACSURENDRADJITHENDRAXNARENDRABVEERNDRAXDRMNDRAXRVINDRABNAGENDRACSURENDRADJITHEN'

    I break it in into substrings for understanding.when substrings concatenated will get TEXT

    ABNAGENDR

    ACSURENDR

    ADJITHENDR

    AXNARENDR

    ABVEERNDR

    AXDRMNDR

    AXRVINDR

    ABNAGENDR

    ACSURENDR

    ADJITHEN'

    From the above text1, I want to get “AXNARENDR”.(This is first substring which is having A other than "B" or "C" or "D",i.e AX)

    from the above substrings/TEXT, My concern is "A with B" or "A with C" or "A with D" occured N times before "A with X" (Here X is dynamic,It can be any character other than "B" or "C" or "D") looking code how to find “AXNARENDR”

  • Ignore earlier thread..

    "First A not followed by a B, C or D statement" means "A is present before B, C or D" need to ignore

    i am explaining again with simple text.This is almost same text as mentioned earlier. JUST "VEERNDAR" replaced with "VEERNDR"

    "AXNARENDR" replaced with "AXNRENDR" for getting expected result as i mentioned in the scenario (Its my mistake..i really sorry for this)

    TEXT 1: 'ABNAGENDRACSURENDRADJITHENDRAXNRENDRABVEERNDRAXDRMNDRAXRVINDRABNAGENDRACSURENDRADJITHEN'

    I break it in into substrings for understanding.when substrings concatenated will get TEXT

    ABNAGENDR

    ACSURENDR

    ADJITHENDR

    AXNRENDR

    ABVEERNDR

    AXDRMNDR

    AXRVINDR

    ABNAGENDR

    ACSURENDR

    ADJITHEN'

    From the above text1, I want to get “AXNRENDR”.(This is first substring which is having A other than "B" or "C" or "D",i.e AX)

    from the above substrings/TEXT, My concern is "A with B" or "A with C" or "A with D" occured N times before "A with X" (Here X is dynamic,It can be any character other than "B" or "C" or "D") looking code how to find “AXNARENDR”

  • Ignore earlier thread also

    "First A not followed by a B, C or D statement" means "A is present before B, C or D" need to ignore

    i am explaining again with simple text.This is almost same text as mentioned earlier. JUST "VEERNDAR" replaced with "VEERNDR"

    "AXNARENDR" replaced with "AXNRENDR" for getting expected result as i mentioned in the scenario (Its my mistake..i really sorry for this)

    TEXT 1: 'ABNAGENDRACSURENDRADJITHENDRAXNRENDRABVEERNDRAXDRMNDRAXRVINDRABNAGENDRACSURENDRADJITHEN'

    I break it in into substrings for understanding.when substrings concatenated will get TEXT

    ABNAGENDR

    ACSURENDR

    ADJITHENDR

    AXNRENDR

    ABVEERNDR

    AXDRMNDR

    AXRVINDR

    ABNAGENDR

    ACSURENDR

    ADJITHEN'

    From the above text1, I want to get “AXNRENDR”.(This is first substring which is having A other than "B" or "C" or "D",i.e AX)

    from the above substrings/TEXT, My concern is "A with B" or "A with C" or "A with D" occured N times before "A with X" (Here X is dynamic,It can be any character other than "B" or "C" or "D") looking code how to find “AXNRENDR”

  • If I understand what you're looking for, it's actually pretty easy. Use a "not" pattern to exclude BCD.

    DECLARE @Text1 VARCHAR(8000);

    SELECT @Text1 = 'ABNAGENDRACSURENDRADJITHENDRAXNARENDRABVEERNDARAXDRMNDRAXRVINDRABNAGENDRACSURENDRADJITHEN';

    -- 111111111122222222222333333333444444444455555555556666666666777777777788888888889

    -- 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890

    SELECT StringPart = SUBSTRING(@Text1, PATINDEX('%A[^BCD]NARENDR%',@Text1),9)

    ,StringPartLocation = PATINDEX('%A[^BCD]NARENDR%',@Text1);

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • rsrvas (9/4/2015)


    Ignore earlier thread..

    "First A not followed by a B, C or D statement" means "A is present before B, C or D" need to ignore

    i am explaining again with simple text.This is almost same text as mentioned earlier. JUST "VEERNDAR" replaced with "VEERNDR"

    "AXNARENDR" replaced with "AXNRENDR" for getting expected result as i mentioned in the scenario (Its my mistake..i really sorry for this)

    TEXT 1: 'ABNAGENDRACSURENDRADJITHENDRAXNRENDRABVEERNDRAXDRMNDRAXRVINDRABNAGENDRACSURENDRADJITHEN'

    I break it in into substrings for understanding.when substrings concatenated will get TEXT

    ABNAGENDR

    ACSURENDR

    ADJITHENDR

    AXNRENDR

    ABVEERNDR

    AXDRMNDR

    AXRVINDR

    ABNAGENDR

    ACSURENDR

    ADJITHEN'

    From the above text1, I want to get “AXNRENDR”.(This is first substring which is having A other than "B" or "C" or "D",i.e AX)

    from the above substrings/TEXT, My concern is "A with B" or "A with C" or "A with D" occured N times before "A with X" (Here X is dynamic,It can be any character other than "B" or "C" or "D") looking code how to find “AXNARENDR”

    Based on this, it looks like you have just split this string up into sections, delimited by (but including the leading) "A". Except this is false for "ABNAGENDR", since it includes an "A".

    Next, if you look at the length of the strings that you supplied, they are various lengths - some are 8, and some are 9.

    So, what are the rules for determining how much of the string that you need to return?

    As was previously mentioned, the string "ABNAGENDR" breaks your rules of "A" not followed by B,C,D... it contains AGENDR, which meets that criteria.

    Please examine the rules that you have given to us so far, and fix the discrepancies. Too much information is missing for us to be able to really help you.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • [font="Arial Black"]NOTICE: BECAUSE OF ONE OF THE EXTREMELY WIDE POSTS ABOVE, YOU MIGHT NOT BE

    ABLE TO TELL THAT THERE ARE MORE PAGES ON THIS THREAD. SCROLL ALL THE WAY DOWN OR ALL THE WAY

    UP AND THEN ALL THE WAY TO THE RIGHT TO GET TO THE "Page x of y" PAGE CLICKABLE LINKS.

    A POSSIBLE SOLUTION COMPLETE WITH TEST DATA CAN BE FOUND ON THE POST AFTER THIS ONE,

    WHICH MIGHT BE ON THE NEXT PAGE FOR YOU DEPENDING ON YOUR FORUM SETTINGS.

    [/font]

    rsrvas (9/4/2015)


    Ignore earlier thread also

    "First A not followed by a B, C or D statement" means "A is present before B, C or D" need to ignore

    i am explaining again with simple text.This is almost same text as mentioned earlier. JUST "VEERNDAR" replaced with "VEERNDR"

    "AXNARENDR" replaced with "AXNRENDR" for getting expected result as i mentioned in the scenario (Its my mistake..i really sorry for this)

    TEXT 1: 'ABNAGENDRACSURENDRADJITHENDRAXNRENDRABVEERNDRAXDRMNDRAXRVINDRABNAGENDRACSURENDRADJITHEN'

    I break it in into substrings for understanding.when substrings concatenated will get TEXT

    ABNAGENDR

    ACSURENDR

    ADJITHENDR

    AXNRENDR

    ABVEERNDR

    AXDRMNDR

    AXRVINDR

    ABNAGENDR

    ACSURENDR

    ADJITHEN'

    From the above text1, I want to get “AXNRENDR”.(This is first substring which is having A other than "B" or "C" or "D",i.e AX)

    from the above substrings/TEXT, My concern is "A with B" or "A with C" or "A with D" occured N times before "A with X" (Here X is dynamic,It can be any character other than "B" or "C" or "D") looking code how to find “AXNRENDR”

    Ah... I think I'm finally baggin' what you're rakin'. I couldn't figure out what you wanted for the length of each string but I think I get it now. Correct me if I'm wrong, please...

    1. You simply want to use the letter "A" as an "inclusive" delimiter to split the "words" out of the string.

    2. Then, you want the FIRST word out of that set (all the "words" start with "A" by definition) that does NOT contain a "B", "C", or "D" for the second letter of the word.

    That does leave a question though. What do you want to return for the following string?

    AAEEEABBBBB

    According to the rules above, that should return only the letter "A".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • With my possibly faulty understanding of the rules (although I think I'm probably pretty close), I came up with this function. As usual, the details, usage example, and some unit test code are all embedded in the function.

    CREATE FUNCTION dbo.FirstOccuranceOfAnotBCD

    /**********************************************************************************************************************

    Purpose:

    Return the "word" found at the first occurance of where the letter "A" is found and it's followed by any letter other

    than the letters "b", "B", "c", "C", "d", or "D". Note that the code is "accent sensitive".

    In other words, this code does a delimited split of the string on the letter "A" and returns the first "word" that

    does NOT contain a "b", "B", "c", "C", "d", or "D" as the second letter.

    Returns:

    Position = Character position of the found word. (BIGINT)

    StringLength = Character Length of the found word. (BIGINT)

    FoundWord = The actual word that was found according to all the rules above. (VARCHAR(8000))

    Programmer's notes.

    1. This function is a high performance iTVF (Inline Table Valued Function) being used as an iSF (Inline Scalar

    Funtion) that returns a single value like a "normal" user defined function but without the performance problems.

    2. This function could be used in the SELECT list of a query using a correlated subquery or it can be used in the

    FROM clause of a query. See the usage examples below.

    Usage Example and Test:

    --===== If the test table already exists, drop it to make reruns in SSMS easier

    IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL

    DROP TABLE #MyHead

    ;

    --===== Create a test table with string examples and expected results

    SELECT *

    INTO #MyHead

    FROM (

    SELECT NULL ,NULL UNION ALL

    SELECT '' ,NULL UNION ALL

    SELECT ' ' ,NULL UNION ALL

    SELECT 'A' ,NULL UNION ALL

    SELECT 'B' ,NULL UNION ALL

    SELECT 'C' ,NULL UNION ALL

    SELECT 'D' ,NULL UNION ALL

    SELECT 'E' ,NULL UNION ALL

    SELECT 'AAA' ,'A' UNION ALL

    SELECT 'BBB' ,NULL UNION ALL

    SELECT 'CCC' ,NULL UNION ALL

    SELECT 'DDD' ,NULL UNION ALL

    SELECT 'EEE' ,NULL UNION ALL

    SELECT 'EEEA' ,NULL UNION ALL

    SELECT 'AAEEE' ,'A' UNION ALL

    SELECT 'ABEEE' ,NULL UNION ALL

    SELECT 'ACEEE' ,NULL UNION ALL

    SELECT 'ADEEE' ,NULL UNION ALL

    SELECT 'AEEEE' ,'AEEEE' UNION ALL

    SELECT 'AAAABBAEE' ,'A' UNION ALL

    SELECT 'ABBAEEAFF' ,'AEE' UNION ALL

    SELECT 'ABBACCAFF' ,'AFF' UNION ALL

    SELECT 'ACCADDAFF' ,'AFF' UNION ALL

    SELECT 'ADDAFFAGG' ,'AFF' UNION ALL

    SELECT 'AFFAGGAHH' ,'AFF' UNION ALL

    SELECT 'ÂFFAGGAHH' ,'AGG' UNION ALL

    SELECT 'AÂÂAGGAHH' ,'AÂÂ' UNION ALL

    SELECT 'ABNAGENDRACSURENDRADJITHENDRAXNARENDRABVEERNDARAXDRMNDRAXRVINDRABNAGENDRACSURENDRADJITHEN','AGENDR' UNION ALL

    -- ^^^^^^

    SELECT 'ABNACENDRACSURENDRADJITHENDRABNARENDRABVEERNDARAXDRMNDRAXRVINDRABNAGENDRACSURENDRADJITHEN','ARENDR'

    -- ^^^^^^

    )d(pString,Expected)

    ;

    --===== Run the function against all the test examples in the test table.

    -- This is also an example of how to use this "iSF" function in the FROM clause.

    SELECT tt.*

    ,fo.*

    FROM #MyHead tt

    OUTER APPLY dbo.FirstOccuranceOfAnotBCD(pString) fo

    ;

    --===== Run the function against all the test examples in the test table.

    -- This is also an example of how to use this "iSF" function in the SELECT list of a query although it only

    -- returns the found text. Additional correlated subqueries would need to be added to returnn the other

    -- columns of the function, which is why it's usually best to use it in the FROM clause as above, instead.

    SELECT tt.*

    ,FoundString = (SELECT FoundWord FROM dbo.FirstOccuranceOfAnotBCD(pString))

    FROM #MyHead tt

    ;

    Revision History:

    Rev 00 - 05 Sep 2015 - Jeff Moden

    - Initial creation and unit test.

    - Reference: http://www.sqlservercentral.com/Forums/Topic1717320-392-1.aspx

    **********************************************************************************************************************/

    --===== Define the IO of this function

    (@pString VARCHAR(8000))

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN WITH

    E1(N) AS ( --=== Pseudo-Cursor creates up to 10E1 or 10 rows

    SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))v(N)) --10E1 or 10 rows

    , E4(N) AS ( --=== Pseudo-Cursor creates up to 10E4 or 10,000 rows

    SELECT 1 FROM E1 a, E1 b, E1 c, E1 d)

    , Tally(N) AS ( --=== Pseudo-Cursor returns 0 to 10,000 sequentially numbered rows (on-the-fly Tally Table)

    SELECT TOP(ISNULL(LEN(@pString),0)) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4)

    , FindFirst AS ( --=== Pseudo-Cursor finds first and next position of 'A' followed anything but a "B", "C", or "D"

    SELECT TOP 1

    P = N --Position of "A?"

    ,NP = ISNULL(NULLIF(CHARINDEX('A',@pString,N+1),0),LEN(@pString)+1) --Next Position of "A?"

    FROM Tally

    WHERE SUBSTRING(@pString,N,2) LIKE '[Aa][^bBcCdD]' COLLATE Latin1_General_BIN

    )

    SELECT Position = P

    ,StringLength = NP-P

    ,FoundWord = SUBSTRING(@pString,P,NP-P)

    FROM FindFirst

    ;

    GO

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    The original poster's rules, no matter how they were stated, across ALL of his/her posts, are broken based on the data supplied and the expected results. It appears that there is some kind of word boundary that the original poster is clearly familiar with, but none of us has yet figured out, but was pointed out by WayneS. Therefore:

    rsrvas,

    You appear to have an understanding of your data that allows you to see words that we can not programmatically reproduce. By what method were you able to construct the various words that you've repeatedly posted but not explained a derivation for, and does your actual data show up in separate records as the individual words, or does it get concatenated with no delimiter before you can get to it? Answers to these questions are essential to us being able to formulate an accurate solution, or even advise you as to how to proceed. You'll have to explain how we can know EXACTLY what constitutes a "word" within the concatenated string.

    No answer = no help.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • This seems to return the same results as Jeff's code without the need of a tally table.

    SELECT tt.*, new.String

    FROM #MyHead tt

    CROSS APPLY (SELECT SUBSTRING( pString, PATINDEX('%A[^BCD]%', pString), 8000)) Start(tString)

    OUTER APPLY (SELECT LEFT( tString, CHARINDEX( 'A', tString + 'A', 2) - 1)

    WHERE tString LIKE 'A[^BCD]%') new(String)

    WHERE Expected IS NOT NULL;

    If you want to have it as a function, here it is (add comments as needed).

    CREATE FUNCTION dbo.FirstOccuranceOfAnotBCD2

    (@pString VARCHAR(8000))

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    SELECT LEFT( tString, CHARINDEX( 'A', tString + 'A', 2) - 1) String

    FROM (SELECT SUBSTRING( @pString, PATINDEX('%A[^BCD]%', @pString), 8000)) Start(tString)

    WHERE tString LIKE 'A[^BCD]%';

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • sgmunson (9/8/2015)


    Jeff,

    The original poster's rules, no matter how they were stated, across ALL of his/her posts, are broken based on the data supplied and the expected results. It appears that there is some kind of word boundary that the original poster is clearly familiar with, but none of us has yet figured out, but was pointed out by WayneS. Therefore:

    rsrvas,

    You appear to have an understanding of your data that allows you to see words that we can not programmatically reproduce. By what method were you able to construct the various words that you've repeatedly posted but not explained a derivation for, and does your actual data show up in separate records as the individual words, or does it get concatenated with no delimiter before you can get to it? Answers to these questions are essential to us being able to formulate an accurate solution, or even advise you as to how to proceed. You'll have to explain how we can know EXACTLY what constitutes a "word" within the concatenated string.

    No answer = no help.

    I believe the word boundary is the next instance of "A?" whether it meets the first word criteria or not. That seems to be fairly well implied when the OP broke out the separate words in his example even if he might have missed something.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Perhaps I'm not understanding this correctly but this seems to get the correct answer based on my interpretation of the requirement.

    DECLARE @string varchar(1000) =

    'ABNAGENDRACSURENDRADJITHENDRAXNRENDRABVEERNDRAXDRMNDRAXRVINDRABNAGENDRACSURENDRADJITHEN';

    SELECT TOP 1 'A'+Item

    FROM dbo.DelimitedSplit8K(@string,'A')

    WHERE item NOT LIKE '[AaBbCc]%' AND item <> '';

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply