• Claire,

    Interesting article. I have indeed found cases where divide and conquer can be helpful to solve a really tricky query form. But I'm afraid I'm going to need to point out a few fallacies in your arguments.

    1. First off, you're comparing apples to oranges when you try to compare the timing results from creating a delimited character string of the rates to an encoded integer value. The string concatenation is always going to be slower.

    2. Your method of creating the delimited list of rate codes can be improved upon. The reason it was spending so much time in those nasty sorts was (probably) because of using DISTINCT.

    SELECT grp=COUNT(*), worker_id, combo=STUFF(

    (

    SELECT ',' + rate_code

    FROM #worker_rate b

    WHERE a.worker_id = b.worker_id

    ORDER BY rate_code

    FOR XML PATH('')

    ), 1, 1, '')

    FROM #worker_rate a

    GROUP BY worker_id

    ORDER BY 1, 2;

    3. You can literally do the same thing as your encoding approach in a single set-based query.

    WITH lu AS

    (

    SELECT rate_code

    ,combo=POWER(2, ROW_NUMBER() OVER (ORDER BY rate_code)-1)

    FROM

    (

    SELECT rate_code

    FROM #worker_rate

    GROUP BY rate_code

    ) a

    )

    SELECT grp=COUNT(*), worker_id, combo=SUM(combo)

    FROM #worker_rate a

    JOIN lu b ON a.rate_code = b.rate_code

    GROUP BY worker_id

    ORDER BY 1, 2;

    The timing results I got for this (at 10M rows) were:

    Method ElapsedMS

    Original concat of rate codes as chars 136593

    Improved concat of rate codes as chars 106220

    'Optimal' Example of Divide and Conquer 29016

    Divide and Conquer but in one step 24006

    Note that all this was pre-coffee so possibly I made an error somewhere but you can check me yourself. Here's the test harness I used.

    create table #worker_rate (

    worker_id int

    ,rate_code nvarchar(10) collate database_default

    primary key (rate_code,worker_id)

    )

    declare @rows int = 10000000 --How many rows of data to create

    --Create test data

    ;with tt(x) as (

    select row_number() over (order by (select null)) from

    (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) x10 (n),

    (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) x100 (n),

    (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) x1000 (n),

    (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) x10000 (n),

    (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) x100000 (n),

    (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) x1000000 (n),

    (values (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) x10000000 (n)

    )

    insert into #worker_rate (worker_id, rate_code)

    select top(@rows) tt.x, rc.code

    from tt

    inner join (values ('a'),('b'),('c')) rc(code)

    on tt.x % 4 <> 0

    or rc.code <> 'c'

    DECLARE @StartDT DATETIME = GETDATE();

    --Original example

    ;with c as (

    select distinct w.worker_id, r.combo

    from #worker_rate w

    cross apply (

    select combo = stuff((select ',' + r.rate_code

    from #worker_rate r

    where w.worker_id = r.worker_id

    order by r.rate_code

    for xml path('')),1,1,'')

    ) r

    )

    select grp = dense_rank() over (order by c.combo), c.worker_id, c.combo

    from c

    order by 1, 2;

    SELECT [Method]='Original concat of rate codes as chars'

    ,[ElapsedMS]=DATEDIFF(millisecond, @StartDT, GETDATE());

    SELECT @StartDT = GETDATE();

    SELECT grp=COUNT(*), worker_id, combo=STUFF(

    (

    SELECT ',' + rate_code

    FROM #worker_rate b

    WHERE a.worker_id = b.worker_id

    ORDER BY rate_code

    FOR XML PATH('')

    ), 1, 1, '')

    FROM #worker_rate a

    GROUP BY worker_id

    ORDER BY 1, 2;

    SELECT [Method]='Improved concat of rate codes as chars'

    ,[ElapsedMS]=DATEDIFF(millisecond, @StartDT, GETDATE());

    SELECT @StartDT = GETDATE();

    --Optimal example

    create table #lookup (

    rate_code nvarchar(10) collate database_default primary key

    ,combo smallint

    )

    insert into #lookup (rate_code,combo)

    select q.rate_code, power(2, q.combo-1)

    from (

    select distinct rate_code, combo=dense_rank() over (order by rate_code)

    from #worker_rate

    ) q

    select grp=dense_rank() over (order by combo), worker_id, combo

    from (

    select p.worker_id, combo=sum(l.combo)

    from #worker_rate p

    inner join #lookup l on l.rate_code = p.rate_code

    group by p.worker_id

    ) q

    order by 1, 2

    SELECT [Method]='''Optimal'' Example of Divide and Conquer'

    ,[ElapsedMS]=DATEDIFF(millisecond, @StartDT, GETDATE());

    SELECT @StartDT = GETDATE();

    WITH lu AS

    (

    SELECT rate_code

    ,combo=POWER(2, ROW_NUMBER() OVER (ORDER BY rate_code)-1)

    FROM

    (

    SELECT rate_code

    FROM #worker_rate

    GROUP BY rate_code

    ) a

    )

    SELECT grp=COUNT(*), worker_id, combo=SUM(combo)

    FROM #worker_rate a

    JOIN lu b ON a.rate_code = b.rate_code

    GROUP BY worker_id

    ORDER BY 1, 2;

    SELECT [Method]='Divide and Conquer but in one step'

    ,[ElapsedMS]=DATEDIFF(millisecond, @StartDT, GETDATE());

    SELECT @StartDT = GETDATE();

    GO

    drop table #lookup

    drop table #worker_rate


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St