compare words of one string that are in another string in different order

  • Hi

    I have the following table, and I want to obtain the IDs of the rows that have column1 with at least 1 word matched in column 2, so the expected result would be

    ID 1 - because there are 2 words in common (a2 a3)

    ID 3 - two words in common (c1 c2)

    ID 4 - one word in common (n2)

    ID ---- STR1 ---- STR2

    ------------------------------------------------------------

    1 ---- a1 a2 a3 ---- a3 a2

    2 ---- b3 b2 b1 ---- x4

    3 ---- c1 c2 ---- c1 c2

    4 ---- n1 n2 ---- n2 y3

    The order of the words doesnt matter as long as there is at least one word of STR1 in STR2

    Thanks in Advance

  • Heh... this is quiet simple (if you are currently dreaming of string-splitting like me). It's not simple.

    What you need is to split your data into pieces by " " and JOIN them (in the WHERE clause). Usually I would suggest to combine two traditional Tally based split functions. For detailed information about the Tally table and a set based split function search this site for Jeff Moden's article about the "Tally" (aka Numbers) table.

    Special issues of your requirement:

    * Your data are not enclosed with the delimiter. Usually set based solutions require this. So you need a UNION ALL tally extension

    * You need to split two columns instead of one. So you need a double-barreled approach

    For detailed information about any kind of split string approaches have a look to this thread:

    http://www.sqlservercentral.com/Forums/Topic695508-338-40.aspx#bm710801

    Give this a try:

    [font="Courier New"]

    DECLARE @t TABLE (Id INT, STR1 VARCHAR(100), STR2 VARCHAR(100))

    INSERT INTO @t

                 SELECT 1, 'a1 a2 a3', 'a3 a2'

       UNION ALL SELECT 2, 'b3 b2 b1', 'x4'

       UNION ALL SELECT 3, 'c1 c2', 'c1 c2'

       UNION ALL SELECT 4, 'n1 n2', 'n2 y3'

    ; WITH

    t1 AS (SELECT 1 N UNION ALL SELECT 2),             -- 2

    t2 AS (SELECT t1.N FROM t1 t1 CROSS JOIN t1 t2),   -- 4

    t3 AS (SELECT t1.N FROM t2 t1 CROSS JOIN t2 t2),   -- 16

    t4 AS (SELECT t1.N FROM t3 t1 CROSS JOIN t3 t2),   -- 256

    Tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N FROM t4 t1 CROSS JOIN t4 t2) -- 65536

    SELECT DISTINCT

          s.*

       FROM @t s

          CROSS APPLY

          (

             SELECT

                   -- Get the first item until the first delimiter

                   SUBSTRING(s.STR1, 1, ISNULL(NULLIF(CHARINDEX(' ', s.STR1, 1) - 1, -1), LEN(s.STR1))) Item

             UNION ALL

             -- Traditional Tally split with extension to get last item without needed delimiter at the end

             SELECT TOP 100 PERCENT

                   SUBSTRING(s.STR1, t.N + 1, ISNULL(NULLIF(CHARINDEX(' ', s.STR1, t.N + 1) - t.N - 1, -t.N - 1), LEN(s.STR1) - t.N)) Item

                FROM Tally t

                WHERE t.N <= LEN(s.STR1)

                   AND SUBSTRING(s.STR1, t.N, 1) = ' '

          ) l1

          CROSS APPLY

          (

             SELECT

                   -- Get the first item until the first delimiter

                   SUBSTRING(s.STR2, 1, ISNULL(NULLIF(CHARINDEX(' ', s.STR2, 1) - 1, -1), LEN(s.STR2))) Item

             UNION ALL

             -- Traditional Tally split with extension to get last item without needed delimiter at the end

             SELECT TOP 100 PERCENT

                   SUBSTRING(s.STR2, t.N + 1, ISNULL(NULLIF(CHARINDEX(' ', s.STR2, t.N + 1) - t.N - 1, -t.N - 1), LEN(s.STR2) - t.N)) Item

                FROM Tally t

                WHERE t.N <= LEN(s.STR2)

                   AND SUBSTRING(s.STR2, t.N, 1) = ' '

          ) l2

       WHERE l1.Item = l2.Item

    [/font]

    Greets

    Flo

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply