|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:32 AM
Points: 136,
Visits: 770
|
|
Hi Guys,
Hope you're all doing well!
I have a problem and will provide an abstraction (extremely simplified) for it, in the hopes someone could maybe assist. Please let me know if I could provide any more code to assist you in helping me.
Consider:
;WITH Abstraction(pYear, pMonth, pValue, Problem) AS ( SELECT 2000 ,1 ,RAND() * 1000 ,'?' UNION ALL SELECT CASE WHEN pMonth = 12 THEN pYear + 1 ELSE pYear END ,CASE WHEN pMonth = 12 THEN 1 ELSE pMonth + 1 END ,RAND(CAST(CAST(NEWID()AS VARBINARY) AS INT)) * 1000 ,'?' FROM Abstraction WHERE CASE WHEN pMonth = 12 THEN pYear + 1 ELSE pYear END <= 2005 ) SELECT * FROM Abstraction
In the column labelled "Problem", I would like to be have: If we look at the output and we are for example in January 2003, the sum of the pValue column for every previous January INCLUSIVE - (January 2000, January 2001, January, 2002, January 2003) This can be calc'd after the CTE has run, sure - but I require this figure inside the CTE as other columns will be dependent on it. The inter-dependence is such, that "chained" CTEs will also not work.
I have attempted SUM() OVER (PARTITION BY...) and have tried joining to the CTE in the recursive block - which seems to be strangely illegal...
As I said, this is a simplified abstraction of the problem, so I hope I can answer your questions within the realm of this example.
Kind thanks in advance!
PS.
Not that I am certain that the extended syntax for window functions in SQL Server 2012 would help - I need to have this solved with syntax available to SQL Server 2008
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Thursday, January 17, 2013 1:46 AM
Points: 231,
Visits: 505
|
|
Try below..Modify it as per your need.
;WITH Abstraction(pYear, pMonth, pValue, jan,feb,mar,april,may,june,july,aug,sep,oct,nov,decb) AS ( SELECT 1999 ,12 ,cast (0 as float) ,cast (0 as float) ,cast (null as float) ,cast (null as float) ,cast (null as float) ,cast (null as float) ,cast (null as float) ,cast (null as float) ,cast (null as float) ,cast (null as float) ,cast (null as float) ,cast (null as float) ,cast (null as float) UNION ALL SELECT CASE WHEN pMonth = 12 THEN pYear + 1 ELSE pYear END ,CASE WHEN pMonth = 12 THEN 1 ELSE pMonth + 1 END ,pv.pvalue ,case when pmonth = 12 then isnull(jan,0.0) + pv.pvalue else ISNULL(jan,0.0) end as jan --this value is for jan if the pmonth is 12 and simialrly below ,case when pmonth = 1 then isnull(feb,0.0) + pv.pvalue else isnull(feb,0.0) end as feb ,case when pmonth = 2 then isnull(mar,0.0) + pv.pvalue else isnull(mar,0.0) end as mar ,case when pmonth = 3 then isnull(april,0.0) + pv.pvalue else isnull(april,0.0) end as april ,case when pmonth = 4 then isnull(may,0.0) + pv.pvalue else isnull(may,0.0) end as may ,case when pmonth = 5 then isnull(june,0.0) + pv.pvalue else isnull(june,0.0) end as june ,case when pmonth = 6 then isnull(july,0.0) + pv.pvalue else isnull(july,0.0) end as july ,case when pmonth = 7 then isnull(aug,0.0) + pv.pvalue else isnull(aug,0.0) end as aug ,case when pmonth = 8 then isnull(sep,0.0) + pv.pvalue else isnull(sep,0.0) end as sep ,case when pmonth = 9 then isnull(oct,0.0) + pv.pvalue else isnull(oct,0.0) end as oct ,case when pmonth = 10 then isnull(nov,0.0) + pv.pvalue else isnull(nov,0.0) end as nov ,case when pmonth = 11 then isnull(decb,0.0) + pv.pvalue else isnull(decb,0.0) end as decb FROM Abstraction a cross apply(select cast (RAND(CAST(CAST(NEWID()AS VARBINARY) AS int)) * 1000 as float) pvalue) pv WHERE CASE WHEN pMonth = 12 THEN pYear + 1 ELSE pYear END <= 2005 ) SELECT pYear,pMonth,pValue, case when pmonth = 1 then jan when pmonth = 2 then feb when pmonth = 3 then mar when pmonth = 4 then april when pmonth = 5 then may when pmonth = 6 then june when pmonth = 7 then july when pmonth = 8 then aug when pmonth = 9 then sep when pmonth = 10 then oct when pmonth = 11 then nov when pmonth = 12 then decb end as problem FROM Abstraction where pYear>= 2000
GulliMeel
Finding top n Worst Performing queries Improve the performance of Merge Join(special case) How to Post Performance Problem -Gail Shaw
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:32 AM
Points: 136,
Visits: 770
|
|
Thank you :) Very cool approach!
Unfortunately, time was a worry, so I have re-written my code in to a while loop, and am inserting row-by-row. While inside the loop, I can query the table I am inserting in to (To stay with the abstraction, I can find the sum of all January values as they are readily query-able in the table I am inserting in to). Not my first choice.
Hope I can find the time to fit your approach to the solution in to my code and measure the performance.
Sadly, the actual problem is a great deal more complex than the abstracted problem.
Thanks again :)
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Thursday, January 17, 2013 1:46 AM
Points: 231,
Visits: 505
|
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
I think your real problem is a bad data model. This is COBOL written in SQL; you even put the comma in the front of the punch card so you can re-use or re-arrange the deck. After a few decades of fixing code, I look for those symptoms.
Another sign of COBOL-mindset is splitting temporal data into fields, the ways they are declared in the DATA DIVISION. In RDBMS, first normal form would put a temporal value into a scalar value in a column.
A useful idiom is a report period calendar. It gives a name to a range of dates.
CREATE TABLE Report_Periods (report_name CHAR(10) NOT NULL PRIMARY KEY CHECK(period_name LIKE '[12][0-9][0-9][0-9]-[01][0-9]-00'), report_start_date DATE NOT NULL, report_end_date DATE NOT NULL, CONSTRAINT date_ordering CHECK (report_start_date <= report_end_date), etc);
These report periods can overlap; a fiscal quarter will be contained in the range of its fiscal year. There can be gaps between them; we have to wait a year between each “Annual Going out Of Business Sale!” and there might be long stretches of time without any special sales. But we want to know their ranges so that the table is fairly constant once it is created.
I like the MySQL convention of using double zeroes for months and years, That is 'yyyy-mm-00' for a month within a year and 'yyyy-00-00' for the whole year. The advantage is that it will sort with the ISO-8601 data format required by Standard SQL. Let's re-do yo0ur table and assume it is loaded with the right data types, etc.
CREATE TABLE Foobars (foobar_month CHAR(10) NOT NULL PRIMARY KEY REFERENCES Report_Periods (report_name), foobar_value INTEGER NOT NULL);
>>If we look at the output and we are for example in January 2003, the sum of the foobar_random_value column for every previous January INCLUSIVE - (January 2000, January 2001, January, 2002, January 2003)<<
SELECT foobar_month, foobar_value, SUM (CASE WHEN foobar_month LIKE '____' + SUBSTRING (foobar_month, 5, 4) +'00' THEN foobar_value ELSE 0 END) OVER (ORDER BY foobar_month ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS foobar_month_cum_tot FROM Foobars WHERE foobar_month >= '2000-00-00';
See why I like that convention? This assume that you have SQL Server 2012, but we can do this with a self-join in older releases. Q: Why did you store the computation in a column? A: That is how COBOL would do it because it has to materialize all values. SQL has virtual tables and columns. Look at the nesting of non-procedural code in the SUM() as opposed to your sequential processing model of chained CTEs.
Think table look up, not computations; JOINs and not loops; nested code and not sequential processes
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 3:06 PM
Points: 32,930,
Visits: 26,816
|
|
CELKO (7/8/2012) I like the MySQL convention of using double zeroes for months and years, That is 'yyyy-mm-00' for a month within a year and 'yyyy-00-00' for the whole year. The advantage is that it will sort with the ISO-8601 data format required by Standard SQL. Let's re-do yo0ur table and assume it is loaded with the right data types, etc.
Since that would require storing temporal data as character based data, I have to ask why would you do that? And, no... not trying to be the smart guy here. I'd really like to know, Joe.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:32 AM
Points: 136,
Visits: 770
|
|
I may be a little presumptuous in saying that I think that the problem stems from trying to build a SQL solution to a very complex problem that is currently solved in Excel.
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
Since that would require storing temporal data as character based data, I have to ask why would you do that?
I am naming a time period because T-SQL does not have an INTERVAL data type. No temporal math is done with it. It is the key to INTERVALs in the Calendar and report periods table.
This convention happens to have some major advantages: language free fixed length & short easy regular expressions in DDL and DML sorts with ISO-8601 display format It is fast in a query.
Think about looking for all the Januaries in English versus "report_month LIKE '____-01-00' instead. Anyone got a better way to name a time period?
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 3:06 PM
Points: 32,930,
Visits: 26,816
|
|
CELKO (7/9/2012)
Since that would require storing temporal data as character based data, I have to ask why would you do that? I am naming a time period because T-SQL does not have an INTERVAL data type. No temporal math is done with it. It is the key to INTERVALs in the Calendar and report periods table. This convention happens to have some major advantages: language free fixed length & short easy regular expressions in DDL and DML sorts with ISO-8601 display format It is fast in a query. Think about looking for all the Januaries in English versus "report_month LIKE '____-01-00' instead. Anyone got a better way to name a time period?
Maybe but the real key here is that I wouldn't name a time period, Joe. Why not use the number of months since "Date zero"?
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|