Stuck with LIKE criteria

  • Hi

    I have problem that I can't get my head around. I'm sure there's a simple solution but I can't figure it out.

    I have two tables (simplified below).

    Prefix

    AAA

    AAB

    AAC

    AAAA

    Data

    AAAFFFF 20

    AAAFFFG 25

    AACGGG 25

    AAAAHHH 10

    I need to sum the data according to the prefixes in the Lookup table.

    Using something like this almost works:

    SELECT Prefix, sum(value)

    FROM Data d inner join Prefix p on p.Prefix = left(d.name, len(p.Prefix))

    As you can see, the AAA and AAAA cause duplicates.

    Is there a simple (and fast) way to only use the longest matching prefix?

  • Quick suggestion with a CTE and ROW_NUMBER ordered by the length of the prefix

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.TBL_PREFIX') IS NOT NULL DROP TABLE dbo.TBL_PREFIX;

    CREATE TABLE dbo.TBL_PREFIX

    (

    PREFIX_VALUE VARCHAR(10) NOT NULL CONSTRAINT PK_DBO_TBL_PREFIX_PREFIX_VALUE PRIMARY KEY CLUSTERED

    ,PREFIX_LEN AS (LEN(PREFIX_VALUE)) PERSISTED

    );

    INSERT INTO dbo.TBL_PREFIX (PREFIX_VALUE)

    VALUES

    ('AAA' )

    ,('AAB' )

    ,('AAC' )

    ,('AAAA');

    IF OBJECT_ID(N'dbo.TBL_DATA') IS NOT NULL DROP TABLE dbo.TBL_DATA;

    CREATE TABLE dbo.TBL_DATA

    (

    DATA_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_DATA_DATA_ID PRIMARY KEY CLUSTERED

    ,DATA_STR VARCHAR(10) NOT NULL

    ,DATA_VALUE INT NOT NULL

    );

    INSERT INTO dbo.TBL_DATA(DATA_STR,DATA_VALUE)

    VALUES

    ('AAAFFFF', 20)

    ,('AAAFFFG', 25)

    ,('AACGGG', 25)

    ,('AAAAHHH', 10);

    ;WITH MATCHED_DATA AS

    (

    SELECT

    TD.DATA_ID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY TD.DATA_ID

    ORDER BY TP.PREFIX_LEN DESC

    ) AS DATA_RID

    ,TD.DATA_STR

    ,TD.DATA_VALUE

    ,TP.PREFIX_VALUE

    ,TP.PREFIX_LEN

    FROM dbo.TBL_DATA TD

    CROSS APPLY dbo.TBL_PREFIX TP

    WHERE CHARINDEX(TP.PREFIX_VALUE,TD.DATA_STR,1) > 0

    )

    SELECT

    MD.DATA_ID

    ,MD.DATA_STR

    ,MD.DATA_VALUE

    ,MD.PREFIX_VALUE

    ,MD.PREFIX_LEN

    FROM MATCHED_DATA MD

    WHERE MD.DATA_RID = 1;

    Results

    DATA_ID DATA_STR DATA_VALUE PREFIX_VALUE PREFIX_LEN

    ----------- ---------- ----------- ------------ -----------

    1 AAAFFFF 20 AAA 3

    2 AAAFFFG 25 AAA 3

    3 AACGGG 25 AAC 3

    4 AAAAHHH 10 AAAA 4

  • Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.

    SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE

    FROM dbo.TBL_DATA d

    CROSS APPLY (

    SELECT TOP (1) p.PREFIX_VALUE

    FROM dbo.TBL_PREFIX p

    WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'

    ORDER BY p.PREFIX_LEN DESC

    ) AS p_cross_apply

    GROUP BY p_cross_apply.PREFIX_VALUE

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (4/13/2015)


    Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.

    SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE

    FROM dbo.TBL_DATA d

    CROSS APPLY (

    SELECT TOP (1) p.PREFIX_VALUE

    FROM dbo.TBL_PREFIX p

    WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'

    ORDER BY p.PREFIX_LEN DESC

    ) AS p_cross_apply

    GROUP BY p_cross_apply.PREFIX_VALUE

    Nice try;-) but the extra SORT operator kind of kills the performance, roughly 3 times slower than the other.

    😎

  • Eirikur Eiriksson (4/13/2015)


    ScottPletcher (4/13/2015)


    Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.

    SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE

    FROM dbo.TBL_DATA d

    CROSS APPLY (

    SELECT TOP (1) p.PREFIX_VALUE

    FROM dbo.TBL_PREFIX p

    WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'

    ORDER BY p.PREFIX_LEN DESC

    ) AS p_cross_apply

    GROUP BY p_cross_apply.PREFIX_VALUE

    Nice try;-) but the extra SORT operator kind of kills the performance, roughly 3 times slower than the other.

    😎

    But it should be SORTing very, very few rows each time, only those that match that specific prefix, which it should be able to do a limited clustered index scan to get. It's disappointing that it doesn't perform better, given that.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thank you for the replies.

    Both solutions did work. However, the data I'm working with is slightly more complex than my simple example. My main data table is a transactional table with well over 100M rows. The Partition solution was too slow in this environment.

    I implemented something like Scotts example. The sub-select worked well for me.

    Thanks for the help.

  • jgale 70688 (4/13/2015)


    Thank you for the replies.

    Both solutions did work. However, the data I'm working with is slightly more complex than my simple example. My main data table is a transactional table with well over 100M rows. The Partition solution was too slow in this environment.

    I implemented something like Scotts example. The sub-select worked well for me.

    Thanks for the help.

    Glad it helped. For max benefit, make sure the prefix table is clustered on the lookup value.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • jgale 70688 (4/13/2015)


    Thank you for the replies.

    Both solutions did work. However, the data I'm working with is slightly more complex than my simple example. My main data table is a transactional table with well over 100M rows. The Partition solution was too slow in this environment.

    I implemented something like Scotts example. The sub-select worked well for me.

    Thanks for the help.

    Quick questions,

    1. What is the cardinality of the prefix table?

    2. Is this a one off query?

    3. How wide (Len) are the string values?

    😎

  • ScottPletcher (4/13/2015)


    Eirikur Eiriksson (4/13/2015)


    ScottPletcher (4/13/2015)


    Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.

    SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE

    FROM dbo.TBL_DATA d

    CROSS APPLY (

    SELECT TOP (1) p.PREFIX_VALUE

    FROM dbo.TBL_PREFIX p

    WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'

    ORDER BY p.PREFIX_LEN DESC

    ) AS p_cross_apply

    GROUP BY p_cross_apply.PREFIX_VALUE

    Nice try;-) but the extra SORT operator kind of kills the performance, roughly 3 times slower than the other.

    😎

    But it should be SORTing very, very few rows each time, only those that match that specific prefix, which it should be able to do a limited clustered index scan to get. It's disappointing that it doesn't perform better, given that.

    It will sort every row in the output for the group by aggregation.

    😎

    On the other hand, I must admit that my test/analysis was somewhat flawed, less than 10K rows / 100 prefixes, without the right indexing it is almost certain that the cross apply will outperform the window function method on larger sets.

  • Eirikur Eiriksson (4/13/2015)


    ScottPletcher (4/13/2015)


    Eirikur Eiriksson (4/13/2015)


    ScottPletcher (4/13/2015)


    Here's an alternative approach, also using CROSS APPLY, that can take advantage of the (presumed) clustered index on prefix.PREFIX_VALUE; no guarantee, but it might perform better.

    SELECT p_cross_apply.PREFIX_VALUE, SUM(d.DATA_VALUE) AS SUM_VALUE

    FROM dbo.TBL_DATA d

    CROSS APPLY (

    SELECT TOP (1) p.PREFIX_VALUE

    FROM dbo.TBL_PREFIX p

    WHERE d.DATA_STR LIKE p.PREFIX_VALUE + '%'

    ORDER BY p.PREFIX_LEN DESC

    ) AS p_cross_apply

    GROUP BY p_cross_apply.PREFIX_VALUE

    Nice try;-) but the extra SORT operator kind of kills the performance, roughly 3 times slower than the other.

    😎

    But it should be SORTing very, very few rows each time, only those that match that specific prefix, which it should be able to do a limited clustered index scan to get. It's disappointing that it doesn't perform better, given that.

    It will sort every row in the output for the group by aggregation.

    😎

    On the other hand, I must admit that my test/analysis was somewhat flawed, less than 10K rows / 100 prefixes, without the right indexing it is almost certain that the cross apply will outperform the window function method on larger sets.

    The final sort for the aggregation is to produce the total that the original post specified. You stopped your code before that point. Presumably to get that summary total you'd have to GROUP BY too ;-).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Removed... Faster solution has already been posed,

    Jason

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply