Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

SQL Subquery Help Expand / Collapse
Author
Message
Posted Wednesday, July 14, 2010 3:48 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, July 20, 2010 3:21 PM
Points: 9, Visits: 25
I'm new to SQL 2008. I got thrust in here because someone left. So I don't know a whole lot so bear with me please. I didn't build any of this just trying to figure out some work arounds for issues we are experiencing.

I have a table. tblResults. That contains several hundred thousands rows of results. It has about 40 columns. These columns have a numeric value. Usually 1 to 5. The columns are named based on the labels our users need to see. An example is Recommend.

What I'm trying to do is pre calculate some averages. Will probably run once per night, to make the data run faster on demand.

I have a system id, another id, label (this is the same as the data column name) in my overall avgs table. What I'm having issues on is selecting the column value from tblResults so I can do some math on it to get the averages. See below.


Select OA.system_id,
OA.id,
OA.label,
(select Convert(Decimal(9,4),Sum(OA.label)/COUNT(*)) from tblResults where date_range between '2010-01-01' and '2011-01-01') as ytd_avg
from overall_avgs OA

I get this error Operand data type varchar is invalid for sum operator. I know its trying to sum 'Recommend' instead of the values in that column. I'm trying to do it this way to save some time. It's got to go through this process quite a bit.

Any help would be appreciated. I'm not sure on the terminology enough to know exactly what this is called.

Thank You
Post #952746
Posted Wednesday, July 14, 2010 4:09 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 11:02 AM
Points: 6,957, Visits: 12,720
Hi caseyo,

please provide the table definition (CREATE TABLE ...) for the table in question (overall_avgs and tblResults). One way to do it is by Right-clicking at the table in Management Studio -> Script table as -> Create To -> New Query window.
It would also help if you could provide some sample date so we can see what the actual data look like. Based on those sample data we'd need to see your expected result so we can test our solution.
Please note that I'm offering a "I got thrust in here because someone left." discount
Usually we'd expect to get ready to use sample data as described in the first link in my signature. But I think this situation calls for an exception....




Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #952758
Posted Thursday, July 15, 2010 8:20 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, July 20, 2010 3:21 PM
Points: 9, Visits: 25
Thanks for the advice. Here are the Creates. I took columns out of the tblResults but it's the same idea, for 30 or more data elements.

CREATE TABLE [dbo].[overall_avgs](
[system_id] [tinyint] NULL,
[id] [tinyint] NULL,
[label] [varchar](50) NULL,
[ytd_avg] [decimal](4, 2) NULL,
[qtd_avg] [decimal](4, 2) NULL,
[mtd_avg] [decimal](4, 2) NULL,
[all_avg] [decimal](4, 2) NULL
) ON [PRIMARY]

CREATE TABLE [dbo].[tblResults](
[id] [int] NULL,
[Visit Date] [datetime] NULL,
[Parking] [int] NULL,
[Invoices] [int] NULL,
[Recommend] [int] NULL
) ON [PRIMARY]

tblResults data would look like this. Int values are 1 to 5 scale and can be null sometimes.
10,'2010-01-15 12:30', 3,1,5
20,'2010-07-15 12:35', 2,4,4
30,'2010-06-15 12:40', 1,2,3


overall_avgs desired data would look like this. I just made up the numbers below but you get the idea. We are going to be collecting more calculations than this, but this should give you an idea.
1,1,'Parking',4.52,4.64,3.50,3.78
1,2,'Invoices',4.75,3.64,2.50,4.78
1,3,'Recommend',4.52,4.64,3.50,3.78


Thanks so much for your help. If that wasn't what you were needing please let me know.

Caseyo
Post #953164
Posted Thursday, July 15, 2010 1:42 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 11:02 AM
Points: 6,957, Visits: 12,720
Ok, here's what I came up with:
DECLARE @overall_avgs TABLE
(
[system_id] [TINYINT] NULL,
[id] [TINYINT] NULL,
[label] [VARCHAR](50) NULL,
[ytd_avg] [DECIMAL](4, 2) NULL,
[qtd_avg] [DECIMAL](4, 2) NULL,
[mtd_avg] [DECIMAL](4, 2) NULL,
[all_avg] [DECIMAL](4, 2) NULL
)

DECLARE @tblResults TABLE
(
[id] [INT] NULL,
[Visit DATE] [DATETIME] NULL,
[Parking] [INT] NULL,
[Invoices] [INT] NULL,
[Recommend] [INT] NULL
)

INSERT INTO @tblResults
SELECT 10,'2010-01-15 12:30', 3,1,5 UNION ALL
SELECT 20,'2010-07-15 12:35', 2,4,4 UNION ALL
SELECT 30,'2010-06-15 12:40', 1,2,3


;
-- step 1: unpivot the table to get more normalized data
WITH cte AS
(
SELECT [id], [Visit DATE],Amount , Item
FROM
(SELECT *
FROM @tblResults) p
UNPIVOT
(Amount FOR Item IN ([Parking], [Invoices], [Recommend])
)AS unpvt
)
SELECT
1 AS system_id,
-- assign hard coded id values to each column since there is no other business logic provided so far
CASE Item
WHEN 'Parking' THEN 1
WHEN 'Invoices' THEN 2
WHEN 'Recommend' THEN 3
ELSE 4 END AS [id],
item,
/* basic concept:
dividend: If the row has a visit date within the period to be covered
then use the value of each column unpivoted previously, e.g. [Parking], otherwise assign Zero. Sum those values.
divisor: The COUNT(*) is replaced by another conditional sum function: if the row matches the condition, assign 1,
otherwise assign Zero and sum those values. It might happen that there are no rows to sum. This would lead to a division by Zero.
To avoid that, the NULLIF function is used to get a division by NULL which will retun NULL.
To display Zero instead of NULL, the ISNULL function is used.
In order to get the result of the division in decimal format, I simply added 0.00 inside the SUM() function.
This will force an implicit conversion and is a lot easier to write than another CAST() statement inside the SUM() function.
It's considered to be faster, too.
Finally, cast the result to dec(4,2) as required.
Apply this concept to each relevant period: ytd, qtd, mtd.
*/ CAST(
ISNULL(
SUM( CASE WHEN [Visit DATE] >= DATEADD(YEAR,DATEDIFF(YEAR,0,GETDATE()),0) THEN Amount+0.00 ELSE 0.00 END)
/
NULLIF(SUM( CASE WHEN [Visit DATE] >= DATEADD(YEAR,DATEDIFF(YEAR,0,GETDATE()),0) THEN 1 ELSE 0 END),0)
,0) AS DECIMAL(4,2)) AS [ytd_avg],
CAST(
ISNULL(
SUM( CASE WHEN [Visit DATE] >= DATEADD(Quarter,DATEDIFF(Quarter,0,GETDATE()),0) THEN Amount+0.00 ELSE 0.00 END)
/
NULLIF(SUM( CASE WHEN [Visit DATE] >= DATEADD(Quarter,DATEDIFF(Quarter,0,GETDATE()),0) THEN 1 ELSE 0 END),0)
,0) AS DECIMAL(4,2)) AS [qtd_avg],
CAST(
ISNULL(
SUM( CASE WHEN [Visit DATE] >= DATEADD(MONTH,DATEDIFF(MONTH,0,GETDATE()),0) THEN Amount+0.00 ELSE 0.00 END)
/
NULLIF(SUM( CASE WHEN [Visit DATE] >= DATEADD(MONTH,DATEDIFF(MONTH,0,GETDATE()),0) THEN 1 ELSE 0 END),0)
,0) AS DECIMAL(4,2)) AS [mtd_avg],
CAST( AVG(Amount+0.00 ) AS DECIMAL(4,2)) AS [all_avg]
FROM cte
GROUP BY item
ORDER BY system_id,id
/* result set
system_id id item ytd_avg qtd_avg mtd_avg all_avg
1 1 Parking 2.00 2.00 2.00 2.00
1 2 Invoices 2.33 4.00 4.00 2.33
1 3 Recommend 4.00 4.00 4.00 4.00
*/

It would also be possible to change that statement into a dynamic SQL to make it a lot shorter (especially if we talk about the 40 cols for the tblResults table).
But I think this code already is beyond the basic SQL 101 stuff...
There's one issue I worry about:
Due to the required result of [all_avg] this query will end up in a table scan or clustered index scan. It would be interesting to know how many rows from previos years are stored compared to the current year. It might be faster to exclude the [all_avg] calculation from this query, use a WHERE clause to limit the rows and add the [all_avg] values using a separate query. But that depends on the data distribution and should be tested for performance.




Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #953452
Posted Thursday, July 15, 2010 2:17 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945, Visits: 2,782
From your narrative, this is not a big table. Here is a guess at the DDL. The design I think you have suffers from attribute splitting. If those 40 things are really attributes of one entity, then they go in the row that models that entity and do not split all over the table with a meta-data column (label) to identify them.

CREATE TABLE SomethingResults
(some_key CHAR(10) NOT NULL,
test_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,
PRIMARY KEY (some_key, test_date),
r01 SMALLINT NOT NULL
CHECK (r01 BETWEEN 1 AND 5),
r02 SMALLINT NOT NULL
CHECK (r02 BETWEEN 1 AND 5),
..
r40 SMALLINT NOT NULL
CHECK (r40 BETWEEN 1 AND 5));

Add appropriate NULL-ability, defaults and checks to each column. Wide tables like this are common with research projects that do lots of tests. I saw one with 130+ columns for an allergy study.

The query can be straight forward. Why are you formatting the data in the back in? All those CAST() calls will prevent any index usage. Since you want year to date data, why not put it in a VIEW that is always current?

CREATE VIEW YTD_Averages
AS
SELECT some_key, CURRENT_TIMESTAMP AS ytd_avg_date,
AVG(r01) AS r01_avg, AVG(r02) AS r02_avg,
.., AVG(r40) AS r40_avg
FROM SomethingResults
WHERE test_date BETWEEN CAST ('2010-01-01' AS DATE)
AND CURRENT_TIMESTAMP
GROUP BY some_key;

I would also work on the data element names. There is no such thing as a magical universal "id" -- it has to identify something in particular.



Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #953477
Posted Thursday, July 15, 2010 2:46 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 11:02 AM
Points: 6,957, Visits: 12,720
@ Joe Celko:
You've demonstrated to design the view for YTD_Averages.
What is your recommendation to get MTD, QTD and ALL_avg?
As far as I can see, one option would be to UNION (ALL) 4 separate views using different WHERE clauses (resulting in 4 separate scan/seek operation) or to use the approach I posted that will use one table scan and several aggregated CASE statements.

It's nothing I would start arguing about but learning from.




Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #953491
Posted Thursday, July 15, 2010 3:18 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, July 20, 2010 3:21 PM
Points: 9, Visits: 25
@LutzM Thank you so much. I think this will get me where i need to go.

This was exactly what I was needing. I modified it to fit the full table and put some more validation on it, not your fault I forgot to include it. The query ran in 7 secs on 30,000 rows worth of data. I did take out the year portion. Your advice makes sense. We weren't sold on needing that anyway. And we can always add it in later.

This is amazing. Exactly what I was trying to do but unsure how to get there. I've got a lot more to do, but know this will help greatly. Thanks Again.





Post #953507
Posted Thursday, July 15, 2010 3:24 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, July 20, 2010 3:21 PM
Points: 9, Visits: 25
@ Joe Celko:

Thank you for your post as well. It's not a huge table, but can take awhile to process especially when calls from the web are involved. I'm more of a .Net Programmer and got forced into SQL Duty. We are actually going to balance showing real time data vs showing pre compiled data. I'll definitely use something very similar to what you suggest for our real time data.

Thanks for your reply as well. This community is top notch.
Post #953510
Posted Thursday, July 15, 2010 3:40 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 11:02 AM
Points: 6,957, Visits: 12,720
caseyo (7/15/2010)
@LutzM Thank you so much. I think this will get me where i need to go.

This was exactly what I was needing. I modified it to fit the full table and put some more validation on it, not your fault I forgot to include it. The query ran in 7 secs on 30,000 rows worth of data. I did take out the year portion. Your advice makes sense. We weren't sold on needing that anyway. And we can always add it in later.

This is amazing. Exactly what I was trying to do but unsure how to get there. I've got a lot more to do, but know this will help greatly. Thanks Again.


7 secs for 30K rows doesn't sound like an acceptable performance. Would you mind sharing your current query and the related execution plan (actual , not estimated)? Seems like there is some more room for improvement.

Did you consider to normalize and properly index your tblResults table? This would definitely help to improve perfromance.

As a side note: You shouldn't be overly amazed that you've got a query you could use almost instantly. You took the major part by providing table structure, sample data and expected result. (Hint: have a look at how I posted the sample data so others might benefit from it and maybe come up with a better solution - [hide away shuddering knowing Joe Celko moght be around]).
For someone not frequently using that forum: Great job!!
For someone who's kinda new to SQL Server: OUTSTANDING JOB!!!!! VERY WELL DONE!!!

If I've been able to show you you're not alone and you left your office with less headaches than expected, I'm excited! Because that's what it's all about: to volunteer on a great forum like this one helping folks like you!




Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #953517
Posted Thursday, July 15, 2010 4:06 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, July 20, 2010 3:21 PM
Points: 9, Visits: 25
@LutzM

I did not consider to normalize and properly index your tblResults table. I'll have to look up tomorrow on how to do that. I'll also submit the full query tomorrow as well.

Thanks so much for your help. Until recently my experience with SQL server has been writing simple insert or update procedures to work with forms from .Net. Nothing to this magnitude.

I'm definitely excited. While 7 seconds sounds like a long time for this. We were looking at hours for this stuff to run before.



Post #953526
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse