search pattern in select statement

  • Hello All,

    I need you help for one search pattern that I want to use in the select statement. The problem is, I have table containing three values ID, Letter, and Sequence.

    letter will have one letter form A-z, while Sequence will have multiple letters separated by comma. what I want is that suppose is the Sequence for the row ends with the same letter for that row then it should not be selected. but in case if the same letter is in the Sequence and another letter is followed by that letter then even after ending same as the letter that row should get selected.

    I have created One sample query for this.

    Create Table #Letter

    (

    Id INT Primary key,

    Letter VARCHAR(2),

    Sequence VARCHAR(250)

    )

    INSERT INTO #Letter Values(1,'A','A+,A+,A+,A+,A+,A,B,A+,A,A,A,A,A,A,A')

    INSERT INTO #Letter Values(2,'A','A+,A+,A+,A,A,A,A,A,A,A,A,A,A,A,A,A')

    INSERT INTO #Letter Values(3,'A','A,A,A+,A,A,A,A,A,A,A,A,A,A,A,A')

    INSERT INTO #Letter Values(4,'A','A,A,A+,A,A,A')

    SELECT *

    FROM #Letter O1

    WHERE Letter = 'A' AND Sequence like '%[^,A][A+B-Z,]%'

    Order BY Id

    DROP TABLE #Letter

    The Output I want is

    1AA+,A+,A+,A+,A+,A,B,A+,A,A,A,A,A,A,A

    3AA,A,A+,A,A,A,A,A,A,A,A,A,A,A,A

    4AA,A,A+,A,A,A

    The 2nd row is not expected in the output since it has A letter at the end.

    2AA+,A+,A+,A,A,A,A,A,A,A,A,A,A,A,A,A

    Other rows are selected because they have 'A' followed by 'A+' or 'B' i.e different letter and then again 'A'.

    Can any one please help me on this query.

    Thank you

    Yatish

  • My solution makes use of a Tally table. If you're unfamiliar with those please have a look through this article: http://www.sqlservercentral.com/articles/T-SQL/62867/[/url]

    Build Tally table:

    --=============================================================================

    -- Setup

    --=============================================================================

    USE tempdb --DB that everyone has where we can cause no harm

    SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance/speed

    DECLARE @StartTime DATETIME --Timer to measure total duration

    SET @StartTime = GETDATE() --Start the timer

    --=============================================================================

    -- Create and populate a Tally table

    --=============================================================================

    --===== Conditionally drop

    IF OBJECT_ID('dbo.Tally') IS NOT NULL

    DROP TABLE dbo.Tally

    --===== Create and populate the Tally table on the fly

    SELECT TOP 11000 --equates to more than 30 years of dates

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Let the public use it

    GRANT SELECT, REFERENCES ON dbo.Tally TO PUBLIC

    --===== Display the total duration

    SELECT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Milliseconds duration'

    Solution query:

    CREATE TABLE #Letter

    (

    Id INT PRIMARY KEY,

    Letter VARCHAR(2),

    Sequence VARCHAR(250)

    )

    GO

    INSERT INTO #Letter

    VALUES (1, 'A', 'A+,A+,A+,A+,A+,A,B,A+,A,A,A,A,A,A,A') ;

    INSERT INTO #Letter

    VALUES (2, 'A', 'A+,A+,A+,A,A,A,A,A,A,A,A,A,A,A,A,A') ;

    INSERT INTO #Letter

    VALUES (3, 'A', 'A,A,A+,A,A,A,A,A,A,A,A,A,A,A,A') ;

    INSERT INTO #Letter

    VALUES (4, 'A', 'A,A,A+,A,A,A') ;

    GO

    DECLARE @Letter VARCHAR(2) = 'A' ;

    WITH cte(Id, sequence)

    AS (

    SELECT Id,

    ',' + sequence + ','

    FROM #Letter

    WHERE Letter = @Letter

    AND sequence LIKE '%' + @Letter

    ),

    cte2(row_num, Id, grade)

    AS (

    SELECT ROW_NUMBER() OVER (PARTITION BY l.Id ORDER BY t.N) AS row_num,

    l.Id,

    SUBSTRING(l.Sequence, N + 1, CHARINDEX(',', l.Sequence, N + 1) - N - 1)

    FROM cte l

    CROSS JOIN tempdb.dbo.Tally t

    WHERE N < LEN(l.Sequence)

    AND SUBSTRING(l.Sequence, N, 1) = ','

    )

    SELECT DISTINCT

    cx2.Id

    FROM cte2 cx2

    WHERE (

    SELECT MAX(row_num)

    FROM cte2

    WHERE Id = cx2.Id

    AND grade != @Letter

    ) > (

    SELECT MIN(row_num)

    FROM cte2

    WHERE Id = cx2.Id

    AND grade = @Letter

    ) ;

    DROP TABLE #Letter

    GO

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi opc.three,

    I am Thankful to you for your suggestion, But i would like to know if there is any other way to solve the query..i mean there is possibility that I may not use the Tally or numbers Table, so what can be other way this can be solved. Please let me know your thoughts.

    if anyone know any other way to solve this please let me know.

    Thank you

    Yatish

  • Hello yatish,

    I'm just curious because these kind of requirements looks more as a puzzle than a real life requirement. Could you explain briefly what's the need for this kind of filter?.

    Anyway I found a solution for your query. I try to replace Letter by '?' and apply a LIKE pattern, I follow these steps:

    - Double any comma in Sequence.

    - Add ',,' at begin and end of Sequence.

    - Now we can replace ',Letter,' by ',?,'.

    - Now we can apply a LIKE pattern.

    THE SQL instruction is

    WITH Letter AS (

    SELECT 1 AS Id, 'A' AS Letter, 'A+,A+,A+,A+,A+,A,B,A+,A,A,A,A,A,A,A' AS Sequence

    UNION SELECT 2, 'A', 'A+,A+,A+,A,A,A,A,A,A,A,A,A,A,A,A,A'

    UNION SELECT 3, 'A', 'A,A,A+,A,A,A,A,A,A,A,A,A,A,A,A'

    UNION SELECT 4, 'A', 'A,A,A+,A,A,A'

    )

    SELECT * FROM Letter

    WHERE RIGHT(Sequence, LEN(Letter)) <> Letter

    OR REPLACE(',,' + replace(Sequence, ',', ',,') + ',,', ',' + Letter + ',', ',?,')

    LIKE '%?,,[^?]%'

    Regards,

    Francesc

  • Hello frfernan,

    Thank you for your reply. I need this for real life example only, I have modified the original fields names and added only sample example. I am creating this Sequence from History of records in database. the requirement is that the letter should be present in the Sequence at least once and it should not be at last i.e. that letter should not be at ending letter in Sequence. also there should be at least one different letter followed by that letter in the Sequence.

    suppose in the example below if I am looking for letter 'A' then it should be as

    1AA+,A+,A+,A+,A+,A,B,A+,A,A,A,A,A,A,A

    3AA,A,A+,A,A,A,A,A,A,A,A,A,A,A,A

    4AA,A,A+,A,A,A

    since in above output in 1st row 'A' is followed by 'B' this is considered in output along with 3rd and 4th rows

    2AA+,A+,A+,A,A,A,A,A,A,A,A,A,A,A,A,A

    In 2nd row 'A' is followed by 'A' only and the Sequence is end with that so it is not considered in output.

    I will give another example

    suppose in the example below if I am looking for letter 'C' then it should be as

    1CA+,A+,A+,A+,A+,A,C,A+,A,A,A,A,A,A,A

    3CA,A,A+,C,A,A,A,A,A,A,A,A,A,A,A

    since in above output in 1st row 'C' is followed by 'A' this is considered in output along with 3rd row.

    2CA+,A+,A+,A,A,A,A,A,A,A,A,A,A,C,C,C

    4CA,A,C,C,C,C

    In 2nd and 4th row 'C' is followed by 'C' only and the Sequence is end with that so it is not considered in output.

    Please let me know If I need to explain this more.

    Thank you for your help

    Thank you

    Yatish

  • yatish.patil (6/6/2011)


    Hi opc.three,

    I am Thankful to you for your suggestion, But i would like to know if there is any other way to solve the query..i mean there is possibility that I may not use the Tally or numbers Table, so what can be other way this can be solved. Please let me know your thoughts.

    if anyone know any other way to solve this please let me know.

    Thank you

    Yatish

    You do not need to use a static Tally table...you have the option to create an inline Tally table using a CTE. It will add a few hundred milliseconds to a query, but there are times when the performance gain is well worth the cycles. Here is the article that shows how to create a Tally table inline: http://www.sqlservercentral.com/articles/tally+table/72993/[/url]

    That said, I went down the wrong rabbit hole. I was tripped up by the different Letter-lengths so went ahead in splitting the string. I think Francesc has the best presented solution. If I am understanding your last post you can simply omit the check where the Letter ends the sequence:

    WITH Letter AS (

    SELECT 1 AS Id, 'A' AS Letter, 'A+,A+,A+,A+,A+,A,B,A+,A,A,A,A,A,A,A' AS Sequence

    UNION SELECT 2, 'A', 'A+,A+,A+,A,A,A,A,A,A,A,A,A,A,A,A,A'

    UNION SELECT 3, 'A', 'A,A,A+,A,A,A,A,A,A,A,A,A,A,A,A'

    UNION SELECT 4, 'A', 'A,A,A+,A,A,A'

    )

    SELECT * FROM Letter

    WHERE REPLACE(',,' + replace(Sequence, ',', ',,') + ',,', ',' + Letter + ',', ',?,')

    LIKE '%?,,[^?]%'

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks Yatish, your explanations are clear. I have only a doubt: "A" and "A+" are different letters at all?, or there is any obscure relation between these two letters?. If not, you simply need to clarify what is the condition about last letter in Sequence to put the instruction to work.

    And a small correction, I commented "Add ',,' at begin and end of Sequence" but really there is no need for a double comma at begin at end, only one comma is needed.

    Regards,

    Francesc

  • frfernan (6/7/2011)


    And a small correction, I commented "Add ',,' at begin and end of Sequence" but really there is no need for a double comma at begin at end, only one comma is needed.

    I think you had it right the first time. I think we need it...

    WITH Letter AS (

    SELECT 1 AS Id, 'A' AS Letter, 'A+,A+,A+,A+,A+,A,B,A+,A,A,A,A,A,A,A' AS Sequence

    UNION SELECT 2, 'A', 'A+,A+,A+,A,A,A,A,A,A,A,A,A,A,A,A,A'

    UNION SELECT 3, 'A', 'A,A,A+,A,A,A,A,A,A,A,A,A,A,A,A'

    UNION SELECT 4, 'A', 'A,A,A+,A,A,A'

    UNION SELECT 5, 'A', 'A,C,A,A,A,A,A'

    )

    SELECT * FROM Letter

    WHERE REPLACE(REPLACE(Sequence, ',', ',,') + ',,', ',' + Letter + ',', ',?,')

    LIKE '%?,,[^?]%'

    ...otherwise row 5 will not be returned when I think it should.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Oh,

    I didn't explain my idea correctly, where you understood

    REPLACE(Sequence, ',', ',,') + ',,'I tried to express

    ',' + REPLACE(Sequence, ',', ',,') + ',', with this last expression your example works correctly.

    Regards,

    Francesc

  • The worded explanation makes more sense with an accompanying code example 😉

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply