March 24, 2008 at 2:07 pm
I have a database that has 5M rows of data in it. The table is setup as such:
Invoice Table
Date
Salesperson
State
Cost
We are trying to get a query that shows the 5 sales people with the most total sales $ PER STATE. There are multiple entries per sales person per state.
Thanks for the help.
March 24, 2008 at 2:25 pm
Could you post the table's create script, some sample data and expected output please.
See here - Forum Etiquette: How to post data/code on a forum to get the best help[/url]
Thanks
Edit: And the index definitions as well please, since it's a fairly large set of data
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 24, 2008 at 3:00 pm
--===== Create the test table with
CREATE TABLE #mytable
(
Date DATETIME,
SalesPerson char(100),
State char(2),
Cost float
)
--===== Insert the test data into the test table
INSERT INTO #mytable
(Date, SalesPerson, State, Cost)
SELECT 'Oct 17 2007 12:00AM','John','AL','235.43' UNION ALL
SELECT 'Oct 17 2007 12:00AM','Joe','OK','432.25' UNION ALL
SELECT 'Oct 17 2007 12:00AM','Joyce','AL','238.98' UNION ALL
SELECT 'Oct 17 2007 12:00AM','John','AL','111.12' UNION ALL
SELECT 'Oct 17 2007 12:00AM','Joe','OK','267.98' UNION ALL
SELECT 'Oct 17 2007 12:00AM','Joyce','OK','367.98' UNION ALL
SELECT 'Oct 17 2007 12:00AM','John','OK','67.98' UNION ALL
SELECT 'Oct 17 2007 12:00AM','Joe','AL','267.98' UNION ALL
SELECT 'Oct 17 2007 12:00AM','Fred','OK','79.98' UNION ALL
SELECT 'Oct 17 2007 12:00AM','Moe','OK','7.98' UNION ALL
SELECT 'Oct 17 2007 12:00AM','Joe','AL','167.98' UNION ALL
SELECT 'Oct 17 2007 12:00AM','Moe','AL','37.98' UNION ALL
SELECT 'Oct 17 2007 12:00AM','Joyce','OK','987.45'
--====================== Expected Result (Top 3 instead of Top 5)
SalesPerson TotalSales State
Joe 435.96 AL
John 346.55 AL
Joyce 238.98 AL
Joyce 1355.43 OK
Joe 700.23 OK
Fred 79.78 OK
At this point, there are no indexes on the table.
March 24, 2008 at 7:20 pm
You're definitely going to need indexing. Recommend state, salesperson with an include for cost
;with firstCte as (
select salesperson,
[state],
sum(cost) totsales
from #mytable
group by salesperson,
[state] ),
SecondCTE as (
select *,
Row_number() over (partition by state order by totsales desc) RN
from firstCTE)
select *
from secondCTE
where rn<4
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 24, 2008 at 10:42 pm
Matt's code is functionally perfect and identical in performance (< 15 seconds total duration) to the following code... the only difference is the following code uses a single CTE for a slightly easier read...
;WITH cteFirst AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY State ORDER BY SUM(Cost) DESC) AS StateRank,
SalesPerson,
State,
SUM(Cost) AS TotalSales
FROM dbo.jbmTest
GROUP BY SalesPerson, State
)
SELECT StateRank,SalesPerson,STR(TotalSales,9,2) AS TotalSales,State
FROM cteFirst
WHERE StateRank <= 5
... and it's pointed at the following test table to show the kind of speed we both got out of the code using the index he recommended. The details are in the comments. Takes less than 3 minutes to build the 5 million row test table and the nonclustered index....
--drop table jbmtest
--===== Create and populate a 5,000,000 row test table.
-- Column "Date" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SalesPerson" contains 256 different sales person names
-- Column "State" has a range of "AA to YA" and "AB to YB" (simulates 50 states)
-- Column "Cost has a range of 0.0000 to 99.9999 non-unique numbers
-- Jeff Moden
SELECT TOP 5000000
Date = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SalesPerson = CAST(RIGHT(NEWID(),2)+'SalesPersonName' AS VARCHAR(100)),
State = CHAR(ABS(CHECKSUM(NEWID()))%25+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%2+65),
Cost = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS FLOAT)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN
CREATE NONCLUSTERED INDEX IX_jbmTest_Composite1 ON dbo.jbmTest
(State ASC, SalesPerson ASC)
INCLUDE (Cost)
Couple of things I noticed... First, your table had CHAR(100) for the Sales Person. If you insist on using a character based name, then it should be VARCHAR(100).
Second, it would be a lot better if SalesPerson were an ID but I don't know how or where you got that data from.
Third, table has no primary key... again, I don't know if this is an interim table, just wasn't included in the example code you provided, or what.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 25, 2008 at 6:58 am
Thanks for all your help. That query works great on SQL 2005. Do you happen to know what would be the fastest way to get the same results from the same data on SQL 2000 or SQL 7?
March 25, 2008 at 8:48 am
Yep... you're gonna have to put the result of the CTE into a temp table as a separate query... include a RANK column. Then, see the part about "Grouped Running Count" in the following URL...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
... and, once you've filled in the RANK column, do a final select where RANK <=5.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply