Customer Ranking

  • I'm required to rank our customers based on total purchases into 3 groups (1-high, 2-medium, 3-low) valued customers, using the total purchasing power of all customers by year. By dividing the Sum of total sales by year into 3 groups and then allocating customers to these groups. Generally speaking, the low value group has more customers then the medium, the medium more then the high because it takes less of them to get to 1/3 the total sales. My current solution takes 1 minute 9 sec to run on approx 9600 customers and I'm wondering if there is a faster solution possibly using CTE and the built in Rank functions. It's currently taking the longest calculating the running total due to the cross join.

    This will allow us to see changes in customer value, e.g. a low value customer has become a high value one, or a high value has become a medium value customer.

    I've included my current solution in with the test data.

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#Sales','U') IS NOT NULL

    DROP TABLE #Sales

    IF OBJECT_ID('TempDB..#RunningTotal2007','U') IS NOT NULL

    DROP TABLE #RunningTotal2007

    IF OBJECT_ID('TempDB..#RunningTotal2008','U') IS NOT NULL

    DROP TABLE #RunningTotal2008

    --Create the temp table

    Create Table #Sales

    (

    House_ID Int Primary Key Clustered

    ,[2007] Decimal(18,2)

    ,[2008] Decimal(18,2)

    ,[2007 Rank] nvarchar(50)

    ,[2008 Rank] nvarchar(50)

    )

    --Insert our test data (Actual code runs on 15,000 rows +/-)

    Insert Into#Sales

    Select

    House_ID

    ,[2007]

    ,[2008]

    ,Cast(Null As nvarchar(50)) As [2007 Rank]

    ,Cast(Null As nvarchar(50)) As [2008 Rank]

    From

    (

    Select

    1 As House_ID

    ,125 As [2007]

    ,569 As [2008]

    Union All

    Select 2,212,924

    Union All

    Select 3,354,862

    Union All

    Select 4,824,724

    Union All

    Select 5,565,628

    Union All

    Select 6,618,1028

    Union All

    Select 7,784,488

    Union All

    Select 8,465,321

    Union All

    Select 9,994,284

    Union All

    Select 10,1016,161

    )A

    --declare our dividor

    Declare @Dividor As Decimal(18,4)

    --get a running total starting at the smallest number and working to the largest

    Select

    a.House_ID As House_ID

    ,Sum(b.[2007]) As Total

    Into

    #RunningTotal2007

    From

    #Sales a

    Cross Join #Sales b

    Where

    IsNull(b.[2007],0) <= IsNull(a.[2007],0)

    Group By

    a.House_ID

    --We want 3 groups of customers, high, medium or low ones, this could also be an integer value

    --1,2,3 and converted on our output to high, medium low when we join up with our demographics.

    Set @Dividor = (Select Sum(IsNull([2007],0)) / 3 from #Sales)

    update #Sales

    Set [2007 Rank] =Case

    When isNull(Temp.Total,0) <= @Dividor Then 'Low'

    When (isNull(Temp.Total,0) > @Dividor) And (isNull(Temp.Total,0) <= @Dividor*2) then 'Medium'

    When (isNull(Temp.Total,0) > @Dividor * 2) Then 'High'

    End

    From

    (

    Select

    House_ID

    ,Total

    From

    #RunningTotal2007

    )Temp

    Where

    #Sales.house_ID = temp.house_ID

    --Calculate a running total for the next year

    Select

    a.House_ID As House_ID

    ,Sum(b.[2008]) As Total

    Into

    #RunningTotal2008

    From

    #Sales a

    Cross Join #Sales b

    Where

    IsNull(b.[2008],0) <= IsNull(a.[2008],0)

    Group By

    a.House_ID

    --Again apply the same divisor rules

    Set @Dividor = (Select Sum(IsNull([2008],0)) / 3 from #Sales)

    update #Sales

    Set [2008 Rank] =Case

    When isNull(Temp.Total,0) <= @Dividor Then 'Low'

    When (isNull(Temp.Total,0) > @Dividor) And (isNull(Temp.Total,0) <= @Dividor*2) then 'Medium'

    When (isNull(Temp.Total,0) > @Dividor * 2) Then 'High'

    End

    From

    (

    Select

    House_ID

    ,Total

    From

    #RunningTotal2008

    )Temp

    Where

    #Sales.house_ID = temp.house_ID

    --Lets have the results

    Select

    *

    From

    #Sales

    Order By

    [2008] Desc

  • Have you tried using the NTILE function. Really designed to do just this type of stuff. Run this after inserting into the data into the #Sales table.

    select House_ID, [2007], Ntile(3) over (order by [2007] desc) as '2007 Ranking', [2008], NTILE(3) over (order by [2008] desc) as '2008 Ranking'

    from #sales

    The ranking numbers here don't match perfectly with yours, but they are close and probably more accurage. Also, the execution plan I saw shows that it should run faster and with a lot less coding. Let me know how it works.

    Fraggle

  • I thought about that, the problem is it doesn't divide the Sum([2008]) for example into 3 approximately even groups. If you look at the output from the query you provided 59% of the sales dollars have a rank of 1, 28% have a rank of 2, and 12.8% have a rank of 3. I'm look more for a result of 33.3% for each group +/- a few %.

    upperbognor

  • Those are not the results I get. I get 40% in ranked 1, 30% ranked 2, and 30% ranked 3 for the 10 Rows. So this is roughtly 33% each and probably would be if there was 12 rows.

    Here is my output.

    House_ID 2007 2007 Ranking 2008 2008 Ranking

    ----------- --------------------------------------- -------------------- --------------------------------------- --------------------

    6 618.00 2 1028.00 1

    2 212.00 3 924.00 1

    3 354.00 3 862.00 1

    4 824.00 1 724.00 1

    5 565.00 2 628.00 2

    1 125.00 3 569.00 2

    7 784.00 1 488.00 2

    8 465.00 2 321.00 3

    9 994.00 1 284.00 3

    10 1016.00 1 161.00 3

    (10 row(s) affected)

    Fraggle

  • This would be my ideal ranking. Using the 2008 year as an example

    Select 6,1028,'High' union all

    Select 2,924,'High' union all

    Select 3,862,'Medium' union all

    Select 4,724,'Medium' union all

    Select 5,628,'Medium' union all

    Select 1,569,'Low' union all

    Select 7,488,'Low' union all

    Select 8,321,'Low' union all

    Select 9,284,'Low' union all

    Select 10,161,'Low'

    In this case it breaks down to 32.5% high, 36.9% medium, 30.4% low.

  • Ah, I think see the confusion. I need the distibution by sales not by the number of customers in each group. See my post above for it. It's not the number of customers but the sum of sales for each group that needs to distribute into 1/3rds.

    thank you very much for the assistance you've proviced so far Fraggle!

    Upperbognor

  • This would be my ideal ranking. Using the 2008 year as an example

    Select 6,1028,'High' union all

    Select 2,924,'High' union all

    Select 3,862,'Medium' union all

    Select 4,724,'Medium' union all

    Select 5,628,'Medium' union all

    Select 1,569,'Low' union all

    Select 7,488,'Low' union all

    Select 8,321,'Low' union all

    Select 9,284,'Low' union all

    Select 10,161,'Low'

    In this case it breaks down to 32.5% high, 36.9% medium, 30.4% low.

    You will have to forgive me, but if you have 2 rankings of 'High' in a set of 10, how does that equal 32.5%? My maths says that is equal to 20% (2 rows / 10 total rows).

    The NTILE example I gave you is grouping by sales.

  • Maybe this can show the percentages better.

    High 1952

    % Of Total High 32.593

    Medium 2214

    % Of Total Medium 36.9677

    Low 1823

    % Of Total Low 30.4391

    Total 5989

    Select

    Sum([High]) As High

    ,Sum(High)/ (Sum(High) + Sum(Medium) + Sum(Low)) * 100 As [% Of Total High]

    ,Sum([Medium]) As Medium

    ,Sum(Medium)/ (Sum(High) + Sum(Medium) + Sum(Low)) * 100 As [% Of Total High]

    ,Sum([Low]) As Low

    ,Sum(Low)/ (Sum(High) + Sum(Medium) + Sum(Low)) * 100 As [% Of Total High]

    ,(Sum(High) + Sum(Medium) + Sum(Low)) As [Total]

    From

    (

    Select 6 As house_ID,Cast(1028 As Decimal(18,2)) As [2008],'High' As [Rank] union all

    Select 2,924,'High' union all

    Select 3,862,'Medium' union all

    Select 4,724,'Medium' union all

    Select 5,628,'Medium' union all

    Select 1,569,'Low' union all

    Select 7,488,'Low' union all

    Select 8,321,'Low' union all

    Select 9,284,'Low' union all

    Select 10,161,'Low'

    )A

    Pivot

    (Sum([2008])

    For Rank in ([High],[Medium],[Low]))B

  • Sounds messy. From what I'm seeing - you'd have to do a running total (running from highest to lowest.) which you divide by the grand total to come up with a "running percentage". And then - make some kind of business rule as to what belongs in which tier.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • upperbognor (9/12/2008)


    I'm wondering if there is a faster solution possibly....

    It's currently taking the longest calculating the running total due to the cross join.

    Yes, there is a much faster method and, yes, the cross join is the problem because it makes a "triangular" join. Please see the following URL's for more information...

    [font="Arial Black"]Hidden RBAR: Triangular Joins[/font]

    http://www.sqlservercentral.com/articles/T-SQL/61539/

    [font="Arial Black"]Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5[/font]

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    I've added the necessary changes to your test code... look for the word "change" to see the changes I've made to your code...

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#Sales','U') IS NOT NULL

    DROP TABLE #Sales

    IF OBJECT_ID('TempDB..#RunningTotal2007','U') IS NOT NULL

    DROP TABLE #RunningTotal2007

    IF OBJECT_ID('TempDB..#RunningTotal2008','U') IS NOT NULL

    DROP TABLE #RunningTotal2008

    GO

    --Create the temp table

    Create Table #Sales

    (

    House_ID Int Primary Key Clustered

    ,[2007] Decimal(18,2)

    ,[2008] Decimal(18,2)

    ,[2007 Rank] nvarchar(50)

    ,[2008 Rank] nvarchar(50)

    )

    --Insert our test data (Actual code runs on 15,000 rows +/-)

    Insert Into #Sales

    Select

    House_ID

    ,[2007]

    ,[2008]

    ,Cast(Null As nvarchar(50)) As [2007 Rank]

    ,Cast(Null As nvarchar(50)) As [2008 Rank]

    From

    (

    Select

    1 As House_ID

    ,125 As [2007]

    ,569 As [2008]

    Union All

    Select 2,212,924

    Union All

    Select 3,354,862

    Union All

    Select 4,824,724

    Union All

    Select 5,565,628

    Union All

    Select 6,618,1028

    Union All

    Select 7,784,488

    Union All

    Select 8,465,321

    Union All

    Select 9,994,284

    Union All

    Select 10,1016,161

    )A

    --declare our dividor

    Declare @Dividor As Decimal(18,4)

    --===== Declare a variable to help with the running total calculations (change)

    DECLARE @PrevTotal DECIMAL(9,2)

    --get a running total starting at the smallest number and working to the largest

    --===== Put the data we need into a temp table... note the additional columns

    SELECT IDENTITY(INT,1,1) AS RowNum,

    a.House_ID As House_ID

    ,CAST(SUM(b.[2007]) AS DECIMAL(9,2)) As HouseTotal

    ,CAST(0 AS DECIMAL(9,2)) AS Total

    INTO #RunningTotal2007

    FROM #Sales a

    INNER JOIN #Sales b ON A.House_ID = B.House_ID

    GROUP BY a.House_ID

    ORDER BY SUM(b.[2007]) ASC

    --===== Throw a clustered key on it to keep things in order during the UPDATE

    ALTER TABLE #RunningTotal2007 ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Reset the variable we need to do the running total

    SET @PrevTotal = 0

    --===== Do the running total as lightning speed

    UPDATE #RunningTotal2007

    SET @PrevTotal = Total = HouseTotal+@PrevTotal

    FROM #RunningTotal2007 WITH(INDEX(0)) --Forces clustered index scan

    --===== End of changes

    --We want 3 groups of customers, high, medium or low ones, this could also be an integer value

    --1,2,3 and converted on our output to high, medium low when we join up with our demographics.

    Set @Dividor = (Select Sum(IsNull([2007],0)) / 3 from #Sales)

    update #Sales

    Set [2007 Rank] = Case

    When isNull(Temp.Total,0) <= @Dividor Then 'Low'

    When (isNull(Temp.Total,0) > @Dividor) And (isNull(Temp.Total,0) <= @Dividor*2) then 'Medium'

    When (isNull(Temp.Total,0) > @Dividor * 2) Then 'High'

    End

    From

    (

    Select

    House_ID

    ,Total

    From

    #RunningTotal2007

    )Temp

    Where

    #Sales.house_ID = temp.house_ID

    --Calculate a running total for the next year (Change, whole section replaced)

    --===== Put the data we need into a temp table... note the additional columns

    SELECT IDENTITY(INT,1,1) AS RowNum,

    a.House_ID As House_ID

    ,CAST(SUM(b.[2008]) AS DECIMAL(9,2)) As HouseTotal

    ,CAST(0 AS DECIMAL(9,2)) AS Total

    INTO #RunningTotal2008

    FROM #Sales a

    INNER JOIN #Sales b ON A.House_ID = B.House_ID

    GROUP BY a.House_ID

    ORDER BY SUM(b.[2008]) ASC

    --===== Throw a clustered key on it to keep things in order during the UPDATE

    ALTER TABLE #RunningTotal2008 ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Reset the variable we need to do the running total

    SET @PrevTotal = 0

    --===== Do the running total as lightning speed

    UPDATE #RunningTotal2008

    SET @PrevTotal = Total = HouseTotal+@PrevTotal

    FROM #RunningTotal2008 WITH(INDEX(0)) --Forces clustered index scan

    --===== End of changes

    --Again apply the same divisor rules

    Set @Dividor = (Select Sum(IsNull([2008],0)) / 3 from #Sales)

    update #Sales

    Set [2008 Rank] = Case

    When isNull(Temp.Total,0) <= @Dividor Then 'Low'

    When (isNull(Temp.Total,0) > @Dividor) And (isNull(Temp.Total,0) <= @Dividor*2) then 'Medium'

    When (isNull(Temp.Total,0) > @Dividor * 2) Then 'High'

    End

    From

    (

    Select

    House_ID

    ,Total

    From

    #RunningTotal2008

    )Temp

    Where

    #Sales.house_ID = temp.house_ID

    --Lets have the results

    Select

    *

    From

    #Sales

    Order By

    [2008] Desc

    ... and thank you very much for posting data and code in a usable fashion. It really does help us help you faster. Well done! :):):)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ps... I didn't change any of the smiley faces to ")"... you'll need to do that, I just copied your code.

    Also, if you don't mind... you said it took a 1 minute and 9 seconds to process 9600 rows your way. Please let me know how long it takes my way. Thanks a lot. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • pps... I'm not sure that you need the likes of a running total on this... I'm still looking...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • After looking at this, I think the folks that recommended "NTile" are probably producing the more accurate answer. Are you absolutely positive you want to do that running total thing because I think it's producing an incorrect answer.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/13/2008)


    Also, if you don't mind... you said it took a 1 minute and 9 seconds to process 9600 rows your way. Please let me know how long it takes my way. Thanks a lot. 🙂

    My Way CPU 649,453, Reads 4,200,656, Writes 14, Duration 94,454

    Jeffs way CPU 12,328, Reads 995,322, Writes 20, Duration 12,989

    Thanks alot Jeff for the much faster solution.

    Jeff Moden (9/13/2008)


    After looking at this, I think the folks that recommended "NTile" are probably producing the more accurate answer. Are you absolutely positive you want to do that running total thing because I think it's producing an incorrect answer.

    --Jeff Moden

    Jeff, I do agree that the NTile solution does produce a more accurate division of the number of customers. Though it does not account for the customers spending power. If we look at a 5 customer example where Customer 1 has $1, 2 has $2, 3 has $3, 4 has $4, 5 has $5 for a total spending of $15, 1/3 of the total avalilable spending would be $5.

    The NTile solution would put

    customer 5 and 4 (9 dollars to spend) in the high group,

    3 and 2 (5 dollars to spend) in the medium group and

    1 (1 dollar to spend) in the low group due to the way NTile distributes the data.

    The results would then say medium value customers have the same spending power as 2 high value customers which is obviously not the case.

    Using the running total method, we would have

    customer 5 (5 dollars to spend) in the high group,

    4 in the medium group with $4 to spend, we can't put 3 in this group because then it would give the medium group $7 to spend which is more then 1/3 of the total,

    the low group would have customers 3,2,1 with $6 to spend,

    so your average spending is $5 per group or 1/3 of the total available spending.

    Thank you again everyone for your wonderful suggestions.

    -Upperbognor

  • Thanks, VERY much for the timings. The IO is certainly a much appreciated bonus.

    Also, thanks for the detailed explanation of the "loading" required for your task... the kicker was your explanation of the Medium group and why it couldn't have 2 customers in it. Awesome explanation 'cuz you kept it so simple. Thanks a ton! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply