SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Joining with large table


Joining with large table

Author
Message
Amy.G
Amy.G
SSC-Enthusiastic
SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)

Group: General Forum Members
Points: 185 Visits: 290
Good morning. I have been working on a faster solution for some time, and thought I had it. I have a very large table with all people who work in the state. I need to join that table to a much smaller one of college graduates for outcomes. Since the graduates table will continue to grow, I can't remove anyone from the wage table. So wage table remains huge, and will only get bigger. Now, joining the tables is not a big deal, but for the fact that people often work in more than one job. But I created a table-valued function:

CREATE FUNCTION dbo.GetWages
(@SSN char(9))
RETURNS TABLE
AS
RETURN
SELECT year, quarter, ssn, SUM(hours) hours, SUM(wages) wages, SUM(wages)/nullif(sum(hours),'0') rate
FROM dbo.Wage_data
WHERE ssn IN (@SSN)
GROUP BY year, quarter, ssn;
GO

And it works great:

SELECT student_id, LName, rate
FROM dbo.Degrees_Data CROSS APPLY dbo.GetWages(SSN)

However, we also want to know the industry of employment. When there are two jobs, we want the industry of the main job, defined as one with most wages. To get this, I have been using a CTE in a view, which is very slow since I cannot write in index to it. So again I wrote a TVF:

CREATE FUNCTION dbo.GetWagesAndIndustry (@SSN char(9))
RETURNS TABLE
AS
RETURN

WITH cte AS
(
SELECT year, quarter, ssn, hours, industry
, ROW_NUMBER() over (partition by year, quarter, ssn ORDER BY wages ASC) rn
, SUM(wages) over (partition by year, quarter, ssn ) SumWage
, SUM(hours) over (partition by year, quarter, ssn ) SumHours
FROM dbo.Wage_data
WHERE ssn IN (@SSN)
)
SELECT year, quarter, ssn, industry, SumHours hours, SumWage wages, SumWage/NULLIF(SumHours, '0') rage
FROM cte
WHERE rn = 1

But this is also painfully slow. Eventually we want to be able to create reports on the aggregated data, but who's going to wait 5-10 minutes for a report to generate? Definitely, not my boss. There must be a creative way to group the wage data to one record per person, yet have the industry information still available. What am I not seeing? Thank you for any help.
patrickmcginnis59 10839
patrickmcginnis59 10839
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1740 Visits: 5499
Just curious, what's the drawback of using a bog standard join using tables? (posting a wild guess at such a query :w00tSmile


SELECT grads.student_ID, grads.Lname, WAGES.SumWage, WAGES.SumHours
FROM GRADS
JOIN
(
SELECT year, quarter, ssn, hours, industry
, ROW_NUMBER() over (partition by year, quarter, ssn ORDER BY wages ASC) rn
, SUM(wages) over (partition by year, quarter, ssn ) SumWage
, SUM(hours) over (partition by year, quarter, ssn ) SumHours
FROM dbo.Wage_data
) WAGES
ON GRADS.ssn = WAGES.ssn
where WAGES.rn = 1



to properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
Amy.G
Amy.G
SSC-Enthusiastic
SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)

Group: General Forum Members
Points: 185 Visits: 290
I'm sorry, but I don't know what a bog is (relating to SQL, that is). I googled it, and didn't see anything obvious.
patrickmcginnis59 10839
patrickmcginnis59 10839
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1740 Visits: 5499
Amy.G (4/8/2013)
I'm sorry, but I don't know what a bog is (relating to SQL, that is). I googled it, and didn't see anything obvious.


"Bog Standard" http://en.wiktionary.org/wiki/bog_standard

(idiomatic) Especially plain, ordinary, or unremarkable; having no special, excess or unusual features; plain vanilla "She drives a bog standard economy car."

Sorry about that!

to properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
Amy.G
Amy.G
SSC-Enthusiastic
SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)

Group: General Forum Members
Points: 185 Visits: 290
OK, thank you for the link on ancient ceramic toilets :-P But anyway, you are asking why don't I just put it in a regular table? Correct? The answer is space. I have used temp tables, where I can put on index on the table and run queries from there, which is fine for ad hoc queries. But I'm looking for something to use in reporting services. I played around with a stored proc to generate temp tables, but I wanted to throw out the issue to the forums to see if there were any ideas people had.
patrickmcginnis59 10839
patrickmcginnis59 10839
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1740 Visits: 5499
Amy.G (4/8/2013)
OK, thank you for the link on ancient ceramic toilets :-P But anyway, you are asking why don't I just put it in a regular table? Correct? The answer is space. I have used temp tables, where I can put on index on the table and run queries from there, which is fine for ad hoc queries. But I'm looking for something to use in reporting services. I played around with a stored proc to generate temp tables, but I wanted to throw out the issue to the forums to see if there were any ideas people had.


Ok I was assuming the data was already in two tables that could be indexed. My bad!!!

Also for what its worth, I like temp tables for reporting services, if you use a single hash (like #name), I didn't think there was a problem with it as if I'm not mistaken these are actually only visible to the connection they're created with.

to properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
Amy.G
Amy.G
SSC-Enthusiastic
SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)

Group: General Forum Members
Points: 185 Visits: 290

Ok I was assuming the data was already in two tables that could be indexed. My bad!!!


Oops, I think I misunderstood something. The data are in two separate tables that do have indexes on them. The wage data needs to be grouped.... hold on, it just occurred to me the data can be grouped by employee after the tables have been joined. (don't laugh, it seriously just occurred to me). I'll see where this leads.
Amy.G
Amy.G
SSC-Enthusiastic
SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)

Group: General Forum Members
Points: 185 Visits: 290
I can't believe this. My query went from nearly 10 minutes to 2 seconds. I had this idea that the data had to be summed before joining. The dangers of tunnel vision, I guess.
patrickmcginnis59 10839
patrickmcginnis59 10839
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1740 Visits: 5499
Amy.G (4/8/2013)
I can't believe this. My query went from nearly 10 minutes to 2 seconds. I had this idea that the data had to be summed before joining. The dangers of tunnel vision, I guess.


I'm not an expert or anything, but I've seen plenty of evidence that the sql server software will rearrange query operations in a variety of ways as long as the results would be logically equivalent to what you spec'ed in your query text. Sometimes it can pay to just write your query in the most obvious way and see what the server does with it. Doesn't hurt to eyeball the plan also.

Good luck!

to properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
DiverKas
DiverKas
Old Hand
Old Hand (317 reputation)Old Hand (317 reputation)Old Hand (317 reputation)Old Hand (317 reputation)Old Hand (317 reputation)Old Hand (317 reputation)Old Hand (317 reputation)Old Hand (317 reputation)

Group: General Forum Members
Points: 317 Visits: 460
Amy.G (4/8/2013)

Ok I was assuming the data was already in two tables that could be indexed. My bad!!!


Oops, I think I misunderstood something. The data are in two separate tables that do have indexes on them. The wage data needs to be grouped.... hold on, it just occurred to me the data can be grouped by employee after the tables have been joined. (don't laugh, it seriously just occurred to me). I'll see where this leads.


Glad you got it figured out. Sometimes its best to go back to the logical steps SQL Server uses to work through a query. Great job!
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search