|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, April 09, 2013 8:38 AM
Points: 133,
Visits: 275
|
|
Hello everyone. I am looking to produce a result set from a helpdesk application that tells me the average number of cases per day. I know I can take the total number of records and just divide that by 6 for the number of days in the work week but i'm trying to think futher ahead and be able to extract results such as avg number of cases per day and then find avg number of cases on specific days such as Monday, Tuesday, etc.... I think something such as a recursive CTE is needed but hope you can assist with a proper method.
Below is my table definition with some insert values.
USE [GKHELPDESK] GO
/****** Object: Table [dbo].[CaseData1] Script Date: 10/13/2012 10:53:50 ******/ SET ANSI_NULLS ON GO
SET QUOTED_IDENTIFIER ON GO
SET ANSI_PADDING ON GO
CREATE TABLE [dbo].[CaseData1]( [Id] [int] IDENTITY(100,1) NOT NULL, [CaseDate] [date] NOT NULL, [Customer] [varchar](50) NOT NULL, [Category] [int] NULL, [CaseDescription] [varchar](30) NOT NULL, [CaseStatus] [bit] NOT NULL, [ClosedDate] [date] NULL, [Resolution] [varchar](400) NULL, PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
GO SET ANSI_PADDING OFF GO INSERT INTO [GKHELPDESK].[dbo].[CaseData1] ([CaseDate] ,[Customer] ,[Category] ,[CaseDescription] ,[CaseStatus] ,[ClosedDate] ,[Resolution]) VALUES ('20120101', N'Joe Smith', 1, N'Outlook Error', 1, '20120101', N'Close outlook Reopen Outlook') ,('20120102', N'Jim Smith', 1, N'Windows Error', 1, '20120102', N'Restart computer') ,('20120103', N'James Smith', 1, N'ERP Error', 1, '20120103', N'Close application reopen app') ,('20120103', N'Jessie Smith', 1, N'Printer Error', 1, '20120103', N'Power cycle printer'); GO
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
You have never looked at a book on data modeling or ISO standards. So the first thing we need to do is fix the DDL you did post. IDENTITY is how assembly languages programmers who still think in terms of sequential magnetic tape files fake a record number in SQL. Or worse, the lines on a clipboard form. They also use assembly language things like BIT flags. Rows are not records.
None of the string sizes look researched. The column names are vague generic things, like “customer” or “category” that are not proper data models. Here is a re-write with meaningful names that follow ISO-11179, has constraints, and begs for a real design. In particular, do you know how to properly create a “problem_category” encoding scheme? It is usually a hierarchy (a Dewey Decimal of Disaster, as we call them)? Do you know how to properly create a “problem_status” encoding scheme? How were you going to get “resolved by ticket withdrawnal”, “resolved by replacement”, and about 50 other status codes in one BIT?
CREATE TABLE Helpdesk_Tickets (helpdesk_ticket_nbr INTEGER NOT NULL PRIMARY KEY, open_case_date DATE NOT NULL, close_case_date DATE, CHECK (open_case_date <= close_case_date), customer_name VARCHAR(35) NOT NULL, problem_category CHAR(5) NOT NULL, problem_status CHAR(5) NOT NULL, problem_description VARCHAR(30) NOT NULL, resolution_description VARCHAR(400) NOT NULL);
Another hint; in the old days we put commas in the front of the punch cards with one field name on it so we could re-arrange the program deck or save the cards for re-use. You can stop now; a line of program text should be related data elements (city, state, ZIP) that are read and understood by humans as a unit of data or logic.
>> I think something such as a recursive CTE is needed but hope you can assist with a proper method. <<
No. Your mindset is still back in magnetic tapes and procedural programming. Use a Report Period Table. Since SQL is a database language, we prefer to do look ups and not calculations. They can be optimized while temporal math messes up optimization. A useful idiom is a report period calendar that everyone uses so there is no way to get disagreements in the DML. The report period table gives a name to a range of dates that is common to the entire enterprise.
CREATE TABLE Report_Periods (report_name VARCHAR(30) NOT NULL PRIMARY KEY, report_start_date DATE NOT NULL, report_end_date DATE NOT NULL, CONSTRAINT date_ordering CHECK (report_start_date <= report_end_date), etc);
These report periods can overlap or have gaps. I like the MySQL convention of using double zeroes for months and years, That is 'yyyy-mm-00' for a month within a year and 'yyyy-00-00' for the whole year. The advantage is that it will sort with the ISO-8601 data format required by Standard SQL.
I would also build a calendar table with a ISO-8601 week-within-year format column in addition to the usual Common Era date. This format is 'yyyyWww-d' where yyyy is the year, W is as separator token, ww is (01-53) week number and d is (1-7) day of the week.
You input any calendar date, find the week-within-year column and return the dates that match on a LIKE predicate. WHERE sale_day LIKE '2012W26-[67]'
There are several websites with calendars you can cut & paste, but you can start your search with: http://www.calendar-365.com/week-number.html
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, April 09, 2013 8:38 AM
Points: 133,
Visits: 275
|
|
CELKO (10/13/2012) In particular, do you know how to properly create a “problem_category” encoding scheme? It is usually a hierarchy (a Dewey Decimal of Disaster, as we call them)? Do you know how to properly create a “problem_status” encoding scheme? How were you going to get “resolved by ticket withdrawnal”, “resolved by replacement”, and about 50 other status codes in one BIT?
Another hint; in the old days we put commas in the front of the punch cards with one field name on it so we could re-arrange the program deck or save the cards for re-use. You can stop now; a line of program text should be related data elements (city, state, ZIP) that are read and understood by humans as a unit of data or logic.
No. Your mindset is still back in magnetic tapes and procedural programming. Use a Report Period Table. Since SQL is a database language, we prefer to do look ups and not calculations. They can be optimized while temporal math messes up optimization. A useful idiom is a report period calendar that everyone uses so there is no way to get disagreements in the DML. The report period table gives a name to a range of dates that is common to the entire enterprise.
CREATE TABLE Report_Periods (report_name VARCHAR(30) NOT NULL PRIMARY KEY, report_start_date DATE NOT NULL, report_end_date DATE NOT NULL, CONSTRAINT date_ordering CHECK (report_start_date <= report_end_date), etc);
I would also build a calendar table with a ISO-8601 week-within-year format column in addition to the usual Common Era date. This format is 'yyyyWww-d' where yyyy is the year, W is as separator token, ww is (01-53) week number and d is (1-7) day of the week.
You input any calendar date, find the week-within-year column and return the dates that match on a LIKE predicate. WHERE sale_day LIKE '2012W26-[67]'
Joe,
In my actaul database the category column has a foreign key constraint to a table that can be updated to allow for changes over time. I've often seen some category encoding scheme's that are hardwired and result in limited flexibility over time. this category lookup table can accomodate changes over time. I did not include the FK contstraint in the DDL because I wanted to keep the example simple and focused.
The problem status encoding scheme is simple and not flexible but that is mean to be for simplicity and was an agreed upon business requirement. I do agree that there are many status codes which a bit data type will not suffice for however this is a simple break/fix type of help desk application. the case is either opened or closed. The bit data type saves space in the record size. The check constraint you recommend in the CaseData table is definately needed and I thank you for pointing that out.
I also like your idea of having a report_periods_table to keep the business on the same page as to these reporting periods. I must confess though that I am confused as to how such a table will allow the result set to determine what days of the week generate the most help desk cases, averages or other statistical information. What would you recommend to look up that type of data in a design such as this?
I know you recommended cut and paste of a calendar into the database but is there another option for taking a date and looking up the specific week day an incident occurred? If I use your idea of cutting and pasting calendars in the database I would be spending a great deal of effort into updating calendars in a database until my soul is tired of my human body and decides to move to a new plane of existence.
I also get your point about CTE's but it is a tool that is still taught and being expanded on frequently. Sad but true. If I can create a database structure that is faster via lookups instead of relying on mathematical computation I am all for it. I just mentioned CTE's because it is promoted over ANSI design.
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
[/quote] I also like your idea of having a report_periods_table to keep the business on the same page as to these reporting periods. I must confess though that I am confused as to how such a table will allow the result set to determine what days of the week generate the most help desk cases, averages or other statistical information. [/quote]
The calendar table will have a "cal_date" column in DATE data type and a ISO-8601 week-within-year format column
week_in_year_date CHAR(9) NOT NULL CHECK (week_in_year_date LIKE '[12][0-9][0-9][0-9]W[0-5][0-9]-[1-7]')
This format is 'yyyyWww-d' where yyyy is the year, W is as separator token, ww is (01-53) week number, dash and d is (1-7) day of the week. You then use LIKE predicates to get weeks, or days within weeks. For example, for all the Mondays in the year 2012:
WHERE report_date LIKE '2012W__-1'
For weekends in the first quarter of 2012, you need combination:
WHERE cal_date BETWEEN period_start_date AND period_end_date AND period_name = '2012Q1' AND report_date LIKE '2012W__-[67]'
You can do some complicated things with this.
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
SSC Eights!
      
Group: General Forum Members
Last Login: Yesterday @ 11:07 PM
Points: 921,
Visits: 3,745
|
|
CELKO (10/13/2012)
I also like your idea of having a report_periods_table to keep the business on the same page as to these reporting periods. I must confess though that I am confused as to how such a table will allow the result set to determine what days of the week generate the most help desk cases, averages or other statistical information.
The calendar table will have a "cal_date" column in DATE data type and a ISO-8601 week-within-year format column week_in_year_date CHAR(9) NOT NULL CHECK (week_in_year_date LIKE '[12][0-9][0-9][0-9]W[0-5][0-9]-[1-7]') This format is 'yyyyWww-d' where yyyy is the year, W is as separator token, ww is (01-53) week number, dash and d is (1-7) day of the week. You then use LIKE predicates to get weeks, or days within weeks. For example, for all the Mondays in the year 2012: WHERE report_date LIKE '2012W__-1' For weekends in the first quarter of 2012, you need combination: WHERE cal_date BETWEEN period_start_date AND period_end_date AND period_name = '2012Q1' AND report_date LIKE '2012W__-[67]' You can do some complicated things with this.
It's trivial to set up an inline table-valued function which models such a calendar table. The code is simple to modify, e.g. the startdate and enddate parameters could be used to restrict the tally-table numbers generated. On this lappy, it takes about 200ms to return all 10,000 rows.
CREATE FUNCTION [dbo].[LocalCalendar] (@pStartDate DATE, @pEndDate DATE) RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH numbers AS ( SELECT TOP(10000) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)) SELECT cal_date, week_in_year_date = [Year] + 'W' + [Week] + '-' + [weekday], period_name = [Year] + 'Q' + [Quarter] FROM ( SELECT cal_date, [Year] = CAST(YEAR(cal_date) AS CHAR(4)), [Quarter] = CAST(DATEPART(quarter,cal_date) AS CHAR(1)), [Week] = RIGHT('0'+CAST(DATEPART(WEEK,cal_date) AS VARCHAR(2)),2), [weekday] = CAST(DATEPART(weekday,cal_date) AS CHAR(1)) FROM (SELECT cal_date = CAST(DATEADD(DAY,N,'2000/01/01') AS DATE) FROM numbers) d ) f WHERE (@pStartDate IS NULL OR cal_date >= @pStartDate) AND (@pEndDate IS NULL OR cal_date <= @pEndDate) Usage:
SELECT * FROM [dbo].[LocalCalendar] (NULL,NULL)
-- returns 10,000 rows: cal_date week_in_year_date period_name 2000-01-01 2000W01-7 2000Q1 . . 2027-05-18 2027W21-3 2027Q2
SELECT * FROM [dbo].[LocalCalendar] ('20120101','20120601')
-- returns 153 rows: cal_date week_in_year_date period_name 2012-01-01 2012W01-1 2012Q1. . . 2012-06-01 2012W22-6 2012Q2
The week number and day number respect local settings, see DATEPART in BOL for further information.
Low-hanging fruit picker and defender of the moggies
For better assistance in answering your questions, please read this.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
It's trivial to set up an inline table-valued function which models such a calendar table. The code is simple to modify, e.g. the startdate and enddate parameters could be used to restrict the tally-table numbers generated. On this lappy, it takes about 200ms to return all 10,000 rows.
How many times for how many sessions per day will use the proprietary TVF? How do you guarantee that everyone is doing it the same way? This looks nice for a one-shot problem, but the common Calendar that can stay in main storage for everyone and assure that they are all on the same (calendar) page is important for both performance and data integrity.
You will never insert into a calendar table, so you fill the data pages 100%. You are usually looking at a time frame of 1 to 5 years which will fit into cache and stay there.
The data integrity advantage shows up in weird ways. I always use the calculation for Easter to show that the math is awful. But then a client showed me two individual programmers who built their own calendar tables. They got the dates of Easter from their preists. One guy was Orthodox and the other Catholic 
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
SSC Eights!
      
Group: General Forum Members
Last Login: Yesterday @ 11:07 PM
Points: 921,
Visits: 3,745
|
|
CELKO (10/14/2012)
It's trivial to set up an inline table-valued function which models such a calendar table. The code is simple to modify, e.g. the startdate and enddate parameters could be used to restrict the tally-table numbers generated. On this lappy, it takes about 200ms to return all 10,000 rows. How many times for how many sessions per day will use the proprietary TVF? How do you guarantee that everyone is doing it the same way? This looks nice for a one-shot problem, but the common Calendar that can stay in main storage for everyone and assure that they are all on the same (calendar) page is important for both performance and data integrity. You will never insert into a calendar table, so you fill the data pages 100%. You are usually looking at a time frame of 1 to 5 years which will fit into cache and stay there. The data integrity advantage shows up in weird ways. I always use the calculation for Easter to show that the math is awful. But then a client showed me two individual programmers who built their own calendar tables. They got the dates of Easter from their preists. One guy was Orthodox and the other Catholic 
An awful lot of testing went into Jeff Moden et al's TSQL string splitter, and one of the unexpected findings (to me, at least) was that a tally table generated on-the-fly IBG style was more performant than a permanent tally table. Let's see if the same holds here.
Make and populate a calendar table;
DROP TABLE JoesCalendar CREATE TABLE JoesCalendar ( cal_date DATE PRIMARY KEY CLUSTERED, week_in_year_date CHAR(9) NOT NULL CHECK (week_in_year_date LIKE '[12][0-9][0-9][0-9]W[0-5][0-9]-[1-7]'), period_name CHAR(6) NOT NULL)
INSERT INTO JoesCalendar (cal_date, week_in_year_date, period_name) SELECT * FROM [dbo].[LocalCalendar] ('20100101','20141231') Tweak the iTVF for raw performance;
ALTER FUNCTION [dbo].[LocalCalendar] (@pStartDate DATE, @pEndDate DATE) RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH numbers AS ( SELECT TOP(10000) n = ISNULL(ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1,0) FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)) SELECT cal_date, week_in_year_date = [Year] + 'W' + [Week] + '-' + [weekday], period_name = [Year] + 'Q' + [Quarter] FROM ( SELECT cal_date, [Year] = CAST(YEAR(cal_date) AS CHAR(4)), [Quarter] = CAST(DATEPART(quarter,cal_date) AS CHAR(1)), [Week] = RIGHT('0'+CAST(DATEPART(WEEK,cal_date) AS VARCHAR(2)),2), [weekday] = CAST(DATEPART(weekday,cal_date) AS CHAR(1)) FROM (SELECT TOP (ISNULL(1+DATEDIFF(day,@pStartDate,@pEndDate),0)) cal_date = CAST(DATEADD(DAY,n,@pStartDate) AS DATE) FROM numbers) d ) f WHERE (@pStartDate IS NULL OR cal_date >= @pStartDate) AND (@pEndDate IS NULL OR cal_date <= @pEndDate)
A simple little test, return a year's worth of data;
PRINT '' PRINT '== iTVF calendar function =========' SET STATISTICS TIME ON SELECT * FROM [dbo].[LocalCalendar] ('20110101','20111231') SET STATISTICS TIME OFF PRINT '==================================='
PRINT '' PRINT '== calendar table =================' SET STATISTICS TIME ON SELECT * FROM JoesCalendar WHERE cal_date BETWEEN CAST('20110101' AS DATE) AND CAST('20111231' AS DATE) SET STATISTICS TIME OFF PRINT '===================================' Here's a typical result;
== iTVF calendar function =========
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 3 ms. =================================== == calendar table ================= SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 109 ms. =================================== Surprised? Me too. I was only intending to remind you that this is the SQL Server 2008 Forum of SQL Server Central
Low-hanging fruit picker and defender of the moggies
For better assistance in answering your questions, please read this.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
An awful lot of testing went into Jeff Moden et al's TSQL string splitter, and one of the unexpected findings (to me, at least) was that a tally table generated on-the-fly IBG style was more performant than a permanent tally table.
For how many concurrent sessions? This is the part people forget when they benchmark and only do it for the single case. Once any constant table (Calendar, report periods, series, weird math functions, etc.) is created and packed as tight as you can onto data pages with a clustered or covering index, you are done. The overhead becomes cache load time plus the cache access time for each of the hundreds (thousands?) of joins to it.
In practice, I found that the calendar table is used so much in commercial apps it, could be hardwired in main storage, like the look-up tables for math functions in calculator chips. Or SSD for us DB guys Oh, and make it read-only so the optimizer knows it is always available without locks, etc.
But with a function that is creating and dropping a table, no matter how small, each session, you have the same overhead each session, every day, year after year. Ugh!
I am not surprised that loading the week_in_year_date is expensive; it is a lot of temporal and string function calls. Those are expensive in T-SQL.
Think in terms of sets not single cases.
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
SSC Eights!
      
Group: General Forum Members
Last Login: Yesterday @ 11:07 PM
Points: 921,
Visits: 3,745
|
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Tuesday, April 09, 2013 8:38 AM
Points: 133,
Visits: 275
|
|
I'm honored I was able to spur a healthy debate between such knowledgeable sql verterns. Based on everything I just read the best of both suggestions can be achieved. If I take the itvf provided by ChrisM and I should be able to update the suggested calendar table by Celko, with the results. This way I don't have to be concerned with copying and pasting calendar data into a table and I have the option of running the iTvf in my queries if I wish.
ChrisM would you mind walking me through to understand some parts of the iTvf?
WITH numbers AS ( SELECT TOP(10000) n = ISNULL(ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1,0) FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n), (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n))
What is the purpose of the values being set as 0 over 10 columns? Are the d, e, f, g, alias assignments? For your initialization of n you are ordering by null in a grouping function for each row number and then replacing the null value with 0? is that correct or am I way off?
SELECT cal_date, week_in_year_date = [Year] + 'W' + [Week] + '-' + [weekday], period_name = [Year] + 'Q' + [Quarter]
This part is straight forward. It is just the construction of the final result set when the actual select statement is run.
SELECT cal_date, [Year] = CAST(YEAR(cal_date) AS CHAR(4)), [Quarter] = CAST(DATEPART(quarter,cal_date) AS CHAR(1)), [Week] = RIGHT('0'+CAST(DATEPART(WEEK,cal_date) AS VARCHAR(2)),2), [weekday] = CAST(DATEPART(weekday,cal_date) AS CHAR(1))
This part is the sub query which contains the ddl of the temp table. Easy to understand.
FROM (SELECT TOP (ISNULL(1+DATEDIFF(day,@pStartDate,@pEndDate),0)) cal_date = CAST(DATEADD(DAY,n,@pStartDate) AS DATE) FROM numbers) d ) f
Are you aliasing the number of days between startdate and endDate parameters with cal_date?
WHERE (@pStartDate IS NULL OR cal_date >= @pStartDate) AND (@pEndDate IS NULL OR cal_date <= @pEndDate)
Finally your initializing the parameters to be passed from the select statement. That is straight forward as well.
That is some really powerful code that I look forward to using. I really appreciate you and Celko taking the time to post some constructive ideas.
|
|
|
|