Parse a string help

  • @Jeffrey Williams 3188
    Ditto what Lynn said about the requirements. For fun, I put together a couple set-based solutions that I would try. Each solution uses a function that uses ngrams8k at it's core. Note that the link has the code available for download and also describes what a word-level n-gram is. 

    The first solution uses a world-level N-gram tokenizer: WNGrams2012_8K. The second uses SubstringBetweenChar8K. Note the comments in each function for details about how they work. I don't have time to test but, of these two solutions the latter will be faster. 

    Updated Sample Data
    IF OBJECT_ID('tempdb..#testTable') IS NOT NULL DROP TABLE #testTable;
    CREATE TABLE #testTable(InputString varchar(100));
    INSERT #testTable VALUES ('HELLO5-E-100'), ('Hotel5-E-101'), ('BYE5-F-001');
    INSERT #testTable VALUES ('XXX-HELLO5-E-100'), ('X-X-Hotel5-E-101'), ('BYE5-001');

    Using WNGrams2012_8K
    SELECT
      t.inputString, -- included for testing
      Part1 = token,
      Part2 = RIGHT(s.s,CHARINDEX('-',REVERSE(s.s)) - 1)
    FROM #testTable t
    CROSS APPLY (VALUES (t.InputString)) s(s)
    CROSS APPLY dbo.WNGrams2012_8K(s.s,'-', (LEN(s.s)-LEN(REPLACE(s.s,'-',''))))
    WHERE tokenNumber = 1;

    SubstringBetweenChar8K
    SELECT
      t.inputString,
      Part1 = Token,
      Part2 = RIGHT(s.s,CHARINDEX('-',REVERSE(s.s)) - 1)
    FROM #testTable t
    CROSS APPLY (VALUES (t.InputString)) s(s)
    CROSS APPLY (VALUES (LEN(s.s)-LEN(REPLACE(s.s,'-','')))) d(c)
    CROSS APPLY dbo.SubstringBetweenChar8K(s.s,0,d.c,'-');

    Again, what Lynn posted will be the fastest and accommodate any number of "delimiters" but it's always fun to explore set-based alternatives.   

    .

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing post 16 (of 16 total)

You must be logged in to reply to this topic. Login to reply