Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Joining with large table Expand / Collapse
Author
Message
Posted Monday, April 08, 2013 8:29 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, January 10, 2014 9:03 AM
Points: 103, Visits: 289
Good morning. I have been working on a faster solution for some time, and thought I had it. I have a very large table with all people who work in the state. I need to join that table to a much smaller one of college graduates for outcomes. Since the graduates table will continue to grow, I can't remove anyone from the wage table. So wage table remains huge, and will only get bigger. Now, joining the tables is not a big deal, but for the fact that people often work in more than one job. But I created a table-valued function:

CREATE FUNCTION dbo.GetWages
(@SSN char(9))
RETURNS TABLE
AS
RETURN
SELECT year, quarter, ssn, SUM(hours) hours, SUM(wages) wages, SUM(wages)/nullif(sum(hours),'0') rate
FROM dbo.Wage_data
WHERE ssn IN (@SSN)
GROUP BY year, quarter, ssn;
GO

And it works great:

SELECT student_id, LName, rate
FROM dbo.Degrees_Data CROSS APPLY dbo.GetWages(SSN)

However, we also want to know the industry of employment. When there are two jobs, we want the industry of the main job, defined as one with most wages. To get this, I have been using a CTE in a view, which is very slow since I cannot write in index to it. So again I wrote a TVF:

CREATE FUNCTION dbo.GetWagesAndIndustry (@SSN char(9))
RETURNS TABLE
AS
RETURN

WITH cte AS
(
SELECT year, quarter, ssn, hours, industry
, ROW_NUMBER() over (partition by year, quarter, ssn ORDER BY wages ASC) rn
, SUM(wages) over (partition by year, quarter, ssn ) SumWage
, SUM(hours) over (partition by year, quarter, ssn ) SumHours
FROM dbo.Wage_data
WHERE ssn IN (@SSN)
)
SELECT year, quarter, ssn, industry, SumHours hours, SumWage wages, SumWage/NULLIF(SumHours, '0') rage
FROM cte
WHERE rn = 1

But this is also painfully slow. Eventually we want to be able to create reports on the aggregated data, but who's going to wait 5-10 minutes for a report to generate? Definitely, not my boss. There must be a creative way to group the wage data to one record per person, yet have the industry information still available. What am I not seeing? Thank you for any help.
Post #1439869
Posted Monday, April 08, 2013 9:39 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 6:48 AM
Points: 280, Visits: 1,784
Just curious, what's the drawback of using a bog standard join using tables? (posting a wild guess at such a query )

SELECT grads.student_ID, grads.Lname, WAGES.SumWage, WAGES.SumHours
FROM GRADS
JOIN
(
SELECT year, quarter, ssn, hours, industry
, ROW_NUMBER() over (partition by year, quarter, ssn ORDER BY wages ASC) rn
, SUM(wages) over (partition by year, quarter, ssn ) SumWage
, SUM(hours) over (partition by year, quarter, ssn ) SumHours
FROM dbo.Wage_data
) WAGES
ON GRADS.ssn = WAGES.ssn
where WAGES.rn = 1

Post #1439909
Posted Monday, April 08, 2013 9:46 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, January 10, 2014 9:03 AM
Points: 103, Visits: 289
I'm sorry, but I don't know what a bog is (relating to SQL, that is). I googled it, and didn't see anything obvious.
Post #1439913
Posted Monday, April 08, 2013 9:51 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 6:48 AM
Points: 280, Visits: 1,784
Amy.G (4/8/2013)
I'm sorry, but I don't know what a bog is (relating to SQL, that is). I googled it, and didn't see anything obvious.


"Bog Standard" http://en.wiktionary.org/wiki/bog_standard

(idiomatic) Especially plain, ordinary, or unremarkable; having no special, excess or unusual features; plain vanilla "She drives a bog standard economy car."

Sorry about that!
Post #1439917
Posted Monday, April 08, 2013 9:58 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, January 10, 2014 9:03 AM
Points: 103, Visits: 289
OK, thank you for the link on ancient ceramic toilets But anyway, you are asking why don't I just put it in a regular table? Correct? The answer is space. I have used temp tables, where I can put on index on the table and run queries from there, which is fine for ad hoc queries. But I'm looking for something to use in reporting services. I played around with a stored proc to generate temp tables, but I wanted to throw out the issue to the forums to see if there were any ideas people had.
Post #1439920
Posted Monday, April 08, 2013 10:07 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 6:48 AM
Points: 280, Visits: 1,784
Amy.G (4/8/2013)
OK, thank you for the link on ancient ceramic toilets But anyway, you are asking why don't I just put it in a regular table? Correct? The answer is space. I have used temp tables, where I can put on index on the table and run queries from there, which is fine for ad hoc queries. But I'm looking for something to use in reporting services. I played around with a stored proc to generate temp tables, but I wanted to throw out the issue to the forums to see if there were any ideas people had.


Ok I was assuming the data was already in two tables that could be indexed. My bad!!!

Also for what its worth, I like temp tables for reporting services, if you use a single hash (like #name), I didn't think there was a problem with it as if I'm not mistaken these are actually only visible to the connection they're created with.

Post #1439926
Posted Monday, April 08, 2013 10:17 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, January 10, 2014 9:03 AM
Points: 103, Visits: 289

Ok I was assuming the data was already in two tables that could be indexed. My bad!!!


Oops, I think I misunderstood something. The data are in two separate tables that do have indexes on them. The wage data needs to be grouped.... hold on, it just occurred to me the data can be grouped by employee after the tables have been joined. (don't laugh, it seriously just occurred to me). I'll see where this leads.
Post #1439932
Posted Monday, April 08, 2013 10:46 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, January 10, 2014 9:03 AM
Points: 103, Visits: 289
I can't believe this. My query went from nearly 10 minutes to 2 seconds. I had this idea that the data had to be summed before joining. The dangers of tunnel vision, I guess.
Post #1439941
Posted Monday, April 08, 2013 11:13 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 6:48 AM
Points: 280, Visits: 1,784
Amy.G (4/8/2013)
I can't believe this. My query went from nearly 10 minutes to 2 seconds. I had this idea that the data had to be summed before joining. The dangers of tunnel vision, I guess.


I'm not an expert or anything, but I've seen plenty of evidence that the sql server software will rearrange query operations in a variety of ways as long as the results would be logically equivalent to what you spec'ed in your query text. Sometimes it can pay to just write your query in the most obvious way and see what the server does with it. Doesn't hurt to eyeball the plan also.

Good luck!

Post #1439951
Posted Monday, April 08, 2013 11:14 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, August 16, 2013 8:28 AM
Points: 249, Visits: 460
Amy.G (4/8/2013)

Ok I was assuming the data was already in two tables that could be indexed. My bad!!!


Oops, I think I misunderstood something. The data are in two separate tables that do have indexes on them. The wage data needs to be grouped.... hold on, it just occurred to me the data can be grouped by employee after the tables have been joined. (don't laugh, it seriously just occurred to me). I'll see where this leads.


Glad you got it figured out. Sometimes its best to go back to the logical steps SQL Server uses to work through a query. Great job!
Post #1439952
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse