CTE Crazy: Sums, Counts and Pivots I'm missing something here

  • I've not had to use SQL for a while and I'm having trouble getting to grips with aggregates again.

    In the code below I think its This line that is not doing what I expect 

      SUM(S.CallCount) OVER(PARTITION BY S.CallTypeId, S.CallDate) AS NumCalls


    I want to get a report of counts by Day and Type - with the added factor of ignoring some items in the count - the sample data is much simplified to show the specific issue I want to solve. 
    Here is a Sample table

    -- DROP TABLE #TmpData;
    CREATE TABLE #TmpData (
        CallDateTime DateTime,
        Duration Time,
        CallTypeId Int )
    INSERT INTO #TmpData
    SELECT '2017-08-01 10:14:37.00', '00:00:03.450', 1 UNION ALL
    SELECT '2017-08-01 10:55:46.00', NULL, 2 UNION ALL
    SELECT '2017-08-01 11:30:50.00', NULL, 2 UNION ALL
    SELECT '2017-08-01 12:48:02.00', '00:00:02.500', 1 UNION ALL
    SELECT '2017-08-01 14:48:24.00', '00:00:02.910', 1 UNION ALL
    SELECT '2017-08-01 17:05:26.00', '00:00:04.460', 1 UNION ALL
    SELECT '2017-08-04 02:11:47.00', '00:13:54.790', 4 UNION ALL
    SELECT '2017-08-06 01:29:13.00', '00:01:18.180', 5 UNION ALL
    SELECT '2017-08-06 04:33:33.00', NULL, 3 UNION ALL
    SELECT '2017-08-06 04:33:45.00', '00:00:00.000', 5 UNION ALL
    SELECT '2017-08-06 04:33:49.00', NULL, 3 UNION ALL
    SELECT '2017-08-06 04:35:59.00', NULL, 3 UNION ALL
    SELECT '2017-08-06 05:08:22.00', '00:00:00.000', 5 UNION ALL
    SELECT '2017-08-06 20:51:28.00', '00:00:03.310', 4 UNION ALL
    SELECT '2017-08-08 15:49:27.00', '00:00:19.000', 5 UNION ALL
    SELECT '2017-08-10 15:24:19.00', '00:04:32.790', 4 UNION ALL
    SELECT '2017-08-10 18:12:30.00', NULL, 2 UNION ALL
    SELECT '2017-08-10 18:13:01.00', NULL, 3 UNION ALL
    SELECT '2017-08-10 18:40:06.00', NULL, 2 UNION ALL
    SELECT '2017-08-10 22:35:13.00', '00:01:33.710', 4 ;

    Here is my report so far : 
    Its wrong! it produces counts of 1 everywhere there should be a count, but some of those 1's should be true counts e.g. Column [1] on the first date [2017-08-01 should be 4
    And it seems far too complicated - (I've gone CTE crazy here) - in addition to fixing the counts there has got to be a simpler way 

    ;WITH SplitDates AS (
        SELECT    CAST(N.CallDateTime AS date) AS CallDate,
                DATEDIFF(MILLISECOND, '1900-01-01',ISNULL(N.Duration,'00:00:00')) AS [DurationMs],
                N.CallTypeId,
                -- Rules - some circumstances are not counted
                CASE WHEN N.CallTypeID = 4 AND ISNULL(N.Duration,'00:00:00.000') = '00:00:00.000' THEN 0
                     WHEN N.CallTypeID = 5 AND ISNULL(N.Duration,'00:00:00.000') = '00:00:00.000' THEN 0
                     ELSE 1
                END AS CallCount
        FROM #TmpData AS N
    ), DurationByDay AS (
        SELECT CallDate, SUM(DurationMs) AS [Total Duration]
        FROM SplitDates
        GROUP BY CallDate
    ), SplitDays AS (
        SELECT S.CallTypeId,
            SUM(S.CallCount) OVER(PARTITION BY S.CallTypeId, S.CallDate) AS NumCalls,         
           S.CallDate
        FROM SplitDates AS S
        GROUP BY S.CallDate, S.CallTypeId, S.CallCount
    ),  PivotNumbers AS (
        SELECT CallDate, [1], [2], [3], [4], [5] , [Total Duration]  -- TODO Convert back to HH:mm:ss here 
        FROM ( SELECT CallTypeId, T.CallDate, NumCalls, D.[Total Duration]
                FROM SplitDays AS T
                JOIN DurationByDay AS D ON D.CallDate = T.CallDate
                ) AS SourceTable
        PIVOT ( SUM(NumCalls)
                FOR CallTypeId IN ([1], [2], [3], [4], [5]) ) AS PivotTable
    )
    SELECT *
    FROM PivotNumbers
    ORDER BY CallDate

  • OK so preparing this data for SSC seems to have stimuated a few grey cells

    I've fixed the output it - as follows
    However - can it be further simplified?


    ;WITH SplitDates AS (
        SELECT    CAST(N.CallDateTime AS date) AS CallDate,
                DATEDIFF(MILLISECOND, '1900-01-01',ISNULL(N.Duration,'00:00:00')) AS [DurationMs],
                N.CallTypeId,
                -- Rules - some not counted
                CASE WHEN N.CallTypeID = 4 AND ISNULL(N.Duration,'00:00:00.000') = '00:00:00.000' THEN 0
                     WHEN N.CallTypeID = 5 AND ISNULL(N.Duration,'00:00:00.000') = '00:00:00.000' THEN 0
                     ELSE 1
                END AS CallCount
        FROM #TmpData AS N
    ), SumCallsPerTypePerDay AS (
        SELECT SUM(CallCount) AS NumCalls,
       CallDate,
         CallTypeId
        FROM SplitDates
        GROUP BY CallDate, CallTypeId
    ), DurationByDay AS (
        SELECT CallDate, SUM(DurationMs) AS [Total Duration]
        FROM SplitDates
        GROUP BY CallDate
    ), PivotNumbers AS (
        SELECT CallDate, [1], [2], [3], [4], [5] , [Total Duration]
        FROM ( SELECT CallTypeId, T.CallDate, NumCalls, D.[Total Duration]
                FROM SumCallsPerTypePerDay AS T
                JOIN DurationByDay AS D ON D.CallDate = T.CallDate
                ) AS SourceTable
        PIVOT ( SUM(NumCalls)
                FOR CallTypeId IN ([1], [2], [3], [4], [5]) ) AS PivotTable
    )
    SELECT *
    FROM PivotNumbers
    ORDER BY CallDate

  • Your query looks so overwhelming when it can be so simple:

    SELECT CAST( td.CallDateTime AS DATE),
       COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1],
       COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2],
       COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3],
       COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4],
       COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5],
       SUM(DATEDIFF(MILLISECOND, 0,ISNULL(Duration,'00:00:00')))
    FROM #TmpData AS td
    GROUP BY CAST( td.CallDateTime AS DATE);

    EDIT: I made a mistake on the Types 4 and 5. I have corrected it and made it even simpler.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • @luis Cazares
    Thanks for this simplification - I have several of these reports to do, and this will help me get back into the SQL mindset.

  • Luis Cazares - Wednesday, February 28, 2018 7:41 AM

    Your query looks so overwhelming when it can be so simple:

    SELECT CAST( td.CallDateTime AS DATE),
       COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1],
       COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2],
       COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3],
       COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4],
       COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5],
       SUM(DATEDIFF(MILLISECOND, 0,ISNULL(Duration,'00:00:00')))
    FROM #TmpData AS td
    GROUP BY CAST( td.CallDateTime AS DATE);

    EDIT: I made a mistake on the Types 4 and 5. I have corrected it and made it even simpler.

    You can make it even simpler.  Null values are ignored in SUM, so there is no reason to convert it to the additive identity, i.e., zero.


    SELECT CAST( td.CallDateTime AS DATE),
       COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1],
       COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2],
       COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3],
       COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4],
       COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5],
       SUM(DATEDIFF(MILLISECOND, 0,Duration))
    FROM #TmpData AS td
    GROUP BY CAST( td.CallDateTime AS DATE);

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Part of the problem you were running into was your choice of using PIVOT instead of the more flexible CROSS TAB, which Luis used. 

    • PIVOT is limited to aggregating a single attribute, but you want to aggregate TWO attributes (CallCount and Duration).
    • PIVOT can only reference one expression for the column definitions, but you needed to reference two (CallType and Duration).
    I tend not to use PIVOT at all, because of these limitations.
    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Wednesday, February 28, 2018 8:42 AM

    Luis Cazares - Wednesday, February 28, 2018 7:41 AM

    Your query looks so overwhelming when it can be so simple:

    SELECT CAST( td.CallDateTime AS DATE),
       COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1],
       COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2],
       COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3],
       COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4],
       COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5],
       SUM(DATEDIFF(MILLISECOND, 0,ISNULL(Duration,'00:00:00')))
    FROM #TmpData AS td
    GROUP BY CAST( td.CallDateTime AS DATE);

    EDIT: I made a mistake on the Types 4 and 5. I have corrected it and made it even simpler.

    You can make it even simpler.  Null values are ignored in SUM, so there is no reason to convert it to the additive identity, i.e., zero.


    SELECT CAST( td.CallDateTime AS DATE),
       COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1],
       COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2],
       COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3],
       COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4],
       COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5],
       SUM(DATEDIFF(MILLISECOND, 0,Duration))
    FROM #TmpData AS td
    GROUP BY CAST( td.CallDateTime AS DATE);

    Drew

    Except that in some cases, you might get a NULL instead of a zero. It's a matter of what you expect as part of the results.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Tom257 - Wednesday, February 28, 2018 7:19 AM

    I've not had to use SQL for a while and I'm having trouble getting to grips with aggregates again.

    In the code below I think its This line that is not doing what I expect 

      SUM(S.CallCount) OVER(PARTITION BY S.CallTypeId, S.CallDate) AS NumCalls

    I want to get a report of counts by Day and Type - with the added factor of ignoring some items in the count - the sample data is much simplified to show the specific issue I want to solve. 
    Here is a Sample table

    -- DROP TABLE #TmpData;
    CREATE TABLE #TmpData (
        CallDateTime DateTime,
        Duration Time,
        CallTypeId Int )
    INSERT INTO #TmpData
    SELECT '2017-08-01 10:14:37.00', '00:00:03.450', 1 UNION ALL
    SELECT '2017-08-01 10:55:46.00', NULL, 2 UNION ALL
    SELECT '2017-08-01 11:30:50.00', NULL, 2 UNION ALL
    SELECT '2017-08-01 12:48:02.00', '00:00:02.500', 1 UNION ALL
    SELECT '2017-08-01 14:48:24.00', '00:00:02.910', 1 UNION ALL
    SELECT '2017-08-01 17:05:26.00', '00:00:04.460', 1 UNION ALL
    SELECT '2017-08-04 02:11:47.00', '00:13:54.790', 4 UNION ALL
    SELECT '2017-08-06 01:29:13.00', '00:01:18.180', 5 UNION ALL
    SELECT '2017-08-06 04:33:33.00', NULL, 3 UNION ALL
    SELECT '2017-08-06 04:33:45.00', '00:00:00.000', 5 UNION ALL
    SELECT '2017-08-06 04:33:49.00', NULL, 3 UNION ALL
    SELECT '2017-08-06 04:35:59.00', NULL, 3 UNION ALL
    SELECT '2017-08-06 05:08:22.00', '00:00:00.000', 5 UNION ALL
    SELECT '2017-08-06 20:51:28.00', '00:00:03.310', 4 UNION ALL
    SELECT '2017-08-08 15:49:27.00', '00:00:19.000', 5 UNION ALL
    SELECT '2017-08-10 15:24:19.00', '00:04:32.790', 4 UNION ALL
    SELECT '2017-08-10 18:12:30.00', NULL, 2 UNION ALL
    SELECT '2017-08-10 18:13:01.00', NULL, 3 UNION ALL
    SELECT '2017-08-10 18:40:06.00', NULL, 2 UNION ALL
    SELECT '2017-08-10 22:35:13.00', '00:01:33.710', 4 ;

    Here is my report so far : 
    Its wrong! it produces counts of 1 everywhere there should be a count, but some of those 1's should be true counts e.g. Column [1] on the first date [2017-08-01 should be 4
    And it seems far too complicated - (I've gone CTE crazy here) - in addition to fixing the counts there has got to be a simpler way 

    ;WITH SplitDates AS (
        SELECT    CAST(N.CallDateTime AS date) AS CallDate,
                DATEDIFF(MILLISECOND, '1900-01-01',ISNULL(N.Duration,'00:00:00')) AS [DurationMs],
                N.CallTypeId,
                -- Rules - some circumstances are not counted
                CASE WHEN N.CallTypeID = 4 AND ISNULL(N.Duration,'00:00:00.000') = '00:00:00.000' THEN 0
                     WHEN N.CallTypeID = 5 AND ISNULL(N.Duration,'00:00:00.000') = '00:00:00.000' THEN 0
                     ELSE 1
                END AS CallCount
        FROM #TmpData AS N
    ), DurationByDay AS (
        SELECT CallDate, SUM(DurationMs) AS [Total Duration]
        FROM SplitDates
        GROUP BY CallDate
    ), SplitDays AS (
        SELECT S.CallTypeId,
            SUM(S.CallCount) OVER(PARTITION BY S.CallTypeId, S.CallDate) AS NumCalls,         
           S.CallDate
        FROM SplitDates AS S
        GROUP BY S.CallDate, S.CallTypeId, S.CallCount
    ),  PivotNumbers AS (
        SELECT CallDate, [1], [2], [3], [4], [5] , [Total Duration]  -- TODO Convert back to HH:mm:ss here 
        FROM ( SELECT CallTypeId, T.CallDate, NumCalls, D.[Total Duration]
                FROM SplitDays AS T
                JOIN DurationByDay AS D ON D.CallDate = T.CallDate
                ) AS SourceTable
        PIVOT ( SUM(NumCalls)
                FOR CallTypeId IN ([1], [2], [3], [4], [5]) ) AS PivotTable
    )
    SELECT *
    FROM PivotNumbers
    ORDER BY CallDate

    >>I've not had to use SQL for a while and I'm having trouble getting togrips with aggregates again. <<

    actually,you have some more fundamental problems. What you posted is not a table! By definition, a table must have a key and you have no way to ever have a key, because everything is nullable. You then committed a design fallacy called “attribute splitting†by putting a date, a timestamp and a time in their own columns. Let’s go ahead and try and correct your design and bring you up to the current SQL standards(well, actually, what’s been in effect for over 15 years).

    Please think about how silly a “<something>_type_id†is; what is your blood type ID? If you read a book on basic data modeling, youwould learn that an attribute property (that’s the ISO term for it)can be a type or an ID, but not both in some weird silly hybrid.Also, neither a type nor an identifier can be numeric; what math were you going to do on it? What is the square root of your credit cardnumber?

    CREATE TABLE Foobars

    (call_start_timestamp DATETIME2(0) NOT NULL PRIMARY KEY,

    call_end_timestamp DATETIME2(0),

    CHECK(call_start_timestamp <= call_end_timestamp ) ,

    call_typeCHAR(1) NOT NULL

     CHECK(call_type IN (‘01’, ‘02’. ‘03’, ’04’);

    Please use the ANSI/ISO syntax for insertion. Why still use the old Sybase conventions? When you use the original Sybase row-at-a time, theoptimizer can do anything with it. Finally, you made a huge conceptual error; TIME data type is not a duration; it’s a clocktime! What you’re doing is fundamentally wrong. This really isn’tyour fault, so much as the way we measure time. Get a copy of theRick Snodgrass PDF download from the University of Arizona on temporal data in SQL. In SQL server you’re going to need a start and end timestamp (Microsoft has no INTERVAL data type). It would look like this:

    INSERT INTO Foobars

    VALUES

    ('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),

    ('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'),

    etc

    >I want to get a report[sic:queries are not reports;]of counts by call_dateand call_type, with the added factor of ignoring some items in the count - thesample data is much simplified to show the specific issue I want tosolve. <<

    UseCAST (call_start_timestamp AS DATE) AS all_date for your grouping.

    >>I've not had to use SQL for a while and I'm having trouble getting togrips with aggregates again. <<

    actually,you have some more fundamental problems. What you posted is not atable! By definition, a table must have a key and you have no way to ever have a key, because everything is nullable. You then committed a design fallacy called “attribute splitting†by putting a date, a timestamp and a time in their own columns. Let’s go ahead and try and correct your design and bring you up to the current SQL standards(well, actually, what’s been in effect for over 15 years).

    Pleasethink about how silly a “<something>_type_id†is; what isyour blood type ID? If you read a book on basic data modeling, youwould learn that an attribute property (that’s the ISO term for it)can be a type or an ID, but not both in some weird silly hybrid.Also, neither a type nor an identifier can be numeric; what math wereyou going to do on it? What is the square root of your credit cardnumber?

    CREATETABLE Foobars

    (call_start_timestamp DATETIME2(0) NOT NULL PRIMARY KEY,

    call_end_timestamp DATETIME2(0),

    CHECK(call_start_timestamp <= call_end_timestamp ) ,

    call_typeCHAR(1) NOT NULL

     CHECK(call_type IN (‘01’, ‘02’. ‘03’, ’04’);

    Pleasethe ANSI/ISO syntax for insertion. Why still use the old Sybaseconventions? When you use the original Sybase row-at-a time, theoptimizer can do anything with it. Finally, you made a hugeconceptual error; TIME data type is not a duration; it’s a clocktime! What you’re doing is fundamentally wrong. This really isn’tyour fault, so much as the way we measure time. Get a copy of theRick Snodgrass PDF download from the University of Arizona ontemporal data in SQL. In SQL server you’re going to need a startand end timestamp (Microsoft has no INTERVAL data type). It wouldlook like this:

    INSERTINTO Foobars

    VALUES

    ('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),

    ('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'),

    etc

    >I want to get a report[sic:queries are not reports;]of counts by call_dateand call_type, with the added factor of ignoring some items in the count - thesample data is much simplified to show the specific issue I want tosolve. <<

    Use CAST (call_start_timestamp AS DATE) AS all_date for your grouping.

    Pleasethe ANSI/ISO syntax for insertion. Why still use the old Sybaseconventions? When you use the original Sybase row-at-a time, theoptimizer can do anything with it. Finally, you made a hugeconceptual error; TIME data type is not a duration; it’s a clocktime! What you’re doing is fundamentally wrong. This really isn’tyour fault, so much as the way we measure time. Get a copy of theRick Snodgrass PDF download from the University of Arizona ontemporal data in SQL. In SQL server you’re going to need a startand end timestamp (Microsoft has no INTERVAL data type). It wouldlook like this: INSERTINTO FoobarsVALUES('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'), etc>I want to get a report[sic:queries are not reports;]of counts by call_dateand call_type, with the added factor of ignoring some items in the count - thesample data is much simplified to show the specific issue I want tosolve. <<UseCAST (call_start_timestamp AS DATE) AS all_date for your grouping. >>I've not had to use SQL for a while and I'm having trouble getting togrips with aggregates again. <<actually,you have some more fundamental problems. What you posted is not atable! By definition, a table must have a key and you have no way toever have a key, because everything is nullable. You then committed adesign fallacy called “attribute splitting†by putting a date, atimestamp and a time in their own columns. Let’s go ahead and tryand correct your design and bring you up to the current SQLstandards(well, actually, what’s been in effect for over 15 years).Pleasethink about how silly a “<something>_type_id†is; what isyour blood type ID? If you read a book on basic data modeling, youwould learn that an attribute property (that’s the ISO term for it)can be a type or an ID, but not both in some weird silly hybrid.Also, neither a type nor an identifier can be numeric; what math wereyou going to do on it? What is the square root of your credit cardnumber?CREATETABLE Foobars(call_start_timestamp DATETIME2(0) NOT NULL PRIMARY KEY, call_end_timestamp DATETIME2(0), CHECK(call_start_timestamp <= call_end_timestamp ) , call_typeCHAR(1) NOT NULL CHECK(call_type IN (‘01’, ‘02’. ‘03’, ’04’);Pleasethe ANSI/ISO syntax for insertion. Why still use the old Sybaseconventions? When you use the original Sybase row-at-a time, theoptimizer can do anything with it. Finally, you made a hugeconceptual error; TIME data type is not a duration; it’s a clocktime! What you’re doing is fundamentally wrong. This really isn’tyour fault, so much as the way we measure time. Get a copy of theRick Snodgrass PDF download from the University of Arizona ontemporal data in SQL. In SQL server you’re going to need a startand end timestamp (Microsoft has no INTERVAL data type). It wouldlook like this: INSERTINTO FoobarsVALUES('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'), etc>I want to get a report[sic:queries are not reports;]of counts by call_dateand call_type, with the added factor of ignoring some items in the count - thesample data is much simplified to show the specific issue I want tosolve. <<UseCAST (call_start_timestamp AS DATE) AS all_date for your grouping. [/code] etc>I want to get a report[sic:queries are not reports;]of counts by call_dateand call_type, with the added factor of ignoring some items in the count - thesample data is much simplified to show the specific issue I want tosolve. <<UseCAST (call_start_timestamp AS DATE) AS all_date for your grouping. >>I've not had to use SQL for a while and I'm having trouble getting togrips with aggregates again. <<actually,you have some more fundamental problems. What you posted is not atable! By definition, a table must have a key and you have no way toever have a key, because everything is nullable. You then committed adesign fallacy called “attribute splitting†by putting a date, atimestamp and a time in their own columns. Let’s go ahead and tryand correct your design and bring you up to the current SQLstandards(well, actually, what’s been in effect for over 15 years).Pleasethink about how silly a “<something>_type_id†is; what isyour blood type ID? If you read a book on basic data modeling, youwould learn that an attribute property (that’s the ISO term for it)can be a type or an ID, but not both in some weird silly hybrid.Also, neither a type nor an identifier can be numeric; what math wereyou going to do on it? What is the square root of your credit cardnumber?CREATETABLE Foobars(call_start_timestamp DATETIME2(0) NOT NULL PRIMARY KEY, call_end_timestamp DATETIME2(0), CHECK(call_start_timestamp <= call_end_timestamp ) , call_typeCHAR(1) NOT NULL CHECK(call_type IN (‘01’, ‘02’. ‘03’, ’04’);Pleasethe ANSI/ISO syntax for insertion. Why still use the old Sybaseconventions? When you use the original Sybase row-at-a time, theoptimizer can do anything with it. Finally, you made a huge conceptual error; TIME data type is not a duration; it’s a clocktime! What you’re doing is fundamentally wrong. This really isn’t your fault, so much as the way we measure time. Get a copy of the Rick Snodgrass PDF download from the University of Arizona on temporal data in SQL. In SQL server you’re going to need a start and end timestamp (Microsoft has no INTERVAL data type). It wouldlook like this: INSERT INTO Foobars VALUES('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'), etc>I want to get a report[sic:queries are not reports;]of counts by call_date and call_type, with the added factor of ignoring some items in the count - the sample data is much simplified to show the specific issue I want to solve. <<
    UseCAST (call_start_timestamp AS DATE) AS all_date for your grouping. [/code]Please use the ANSI/ISO syntax for insertion. Why still use the old Sybaseconventions? When you use the original Sybase row-at-a time, the optimizer can do anything with it. Finally, you made a hugeconceptual error; TIME data type is not a duration; it’s a clocktime! What you’re doing is fundamentally wrong. This really isn’tyour fault, so much as the way we measure time. Get a copy of theRick Snodgrass PDF download from the University of Arizona on temporal data in SQL. In SQL server you’re going to need a start and end timestamp (Microsoft has no INTERVAL data type). It would look like this: INSERT INTO Foobars VALUES('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'), etc>I want to get a report[sic:queries are not reports;]of counts by call_date and call_type, with the added factor of ignoring some items in the count - the sample data is much simplified to show the specific issue I want to solve. <<Use Cast (call_start_timestamp AS DATE) AS all_date for your grouping.

    >>I've not had to use SQL for a while and I'm having trouble getting togrips with aggregates again. <<actually,you have some more fundamental problems. What you posted is not a
    table! By definition, a table must have a key and you have no way toever have a key, because everything is nullable. You then committed adesign fallacy called “attribute splitting†by putting a date, atimestamp and a time in their own columns. Let’s go ahead and tryand correct your design and bring you up to the current SQL standards(well, actually, what’s been in effect for over 15 years).Pleasethink about how silly a “<something>_type_id†is; what is your blood type ID? If you read a book on basic data modeling, you would learn that an attribute property (that’s the ISO term for it)can be a type or an ID, but not both in some weird silly hybrid.Also, neither a type nor an identifier can be numeric; what math wereyou going to do on it? What is the square root of your credit cardnumber?

    CREATETABLE Foobars(call_start_timestamp DATETIME2(0) NOT NULL PRIMARY KEY, call_end_timestamp DATETIME2(0), CHECK(call_start_timestamp <= call_end_timestamp ) , call_typeCHAR(1) NOT NULL CHECK(call_type IN (‘01’, ‘02’. ‘03’, ’04’);Pleasethe ANSI/ISO syntax for insertion. Why still use the old Sybase conventions? When you use the original Sybase row-at-a time, the optimizer can do anything with it. Finally, you made a huge conceptual error; TIME data type is not a duration; it’s a clocktime! What you’re doing is fundamentally wrong. This really isn't your fault, so much as the way we measure time. Get a copy of the Rick Snodgrass PDF download from the University of Arizona on temporal data in SQL. In SQL server you’re going to need a startand end timestamp (Microsoft has no INTERVAL data type). It would look like this: INSERTINTO Foobars VALUES('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'), etc>I want to get a report[sic:queries are not reports;]of counts by call_date and call_type, with the added factor of ignoring some items in the count - thesample data is much simplified to show the specific issue I want to solve. <<Use CAST (call_start_timestamp AS DATE) AS all_date for your grouping.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Bravo! You really out did yourself, Joe.
    I wonder if anyone can make sense of your post.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • @joe Celko

    Wow Joe, thanks for the thorough treatise!

    In answer:  The original table does have a key, I omitted it as part of the simplification process as it plays no part in the output, as I omitted about 30 other columns from the original table.

    The data presented is intended to isolate the specific problem I was running into, which enabled Luis Cazares to immediately see where I was going wrong.  

    The Duration column - You're right, it's not good to store a timespan in a time column - as you can see by converting to milliseconds for the SUM operation - but once again, the data is already there, and it doesn't include an end time. I guess that underlying this some physical device records a number of milliseconds or ticks and some *bright* developer decided to convert that to a time.  If I ever get to that part of the system I'll see if its possible to store end time, or the timespan as a number.

    CallType is a lookup table - and CallTypeId is a foreign key created automatically by some relationship modelling software - again non under my control and irrelevant to the problem.

    And while you're marking my question like some schoolkid's homework 😀 - maybe improve the way you present your comments, I probably missed half of what you said because of repetition and poor formatting. :crazy:

  • Tom257 - Thursday, March 1, 2018 1:02 PM

    @joe Celko

    Wow Joe, thanks for the thorough treatise!

    In answer:  The original table does have a key, I omitted it as part of the simplification process as it plays no part in the output, as I omitted about 30 other columns from the original table.

    The data presented is intended to isolate the specific problem I was running into, which enabled Luis Cazares to immediately see where I was going wrong.  

    The Duration column - You're right, it's not good to store a timespan in a time column - as you can see by converting to milliseconds for the SUM operation - but once again, the data is already there, and it doesn't include an end time. I guess that underlying this some physical device records a number of milliseconds or ticks and some *bright* developer decided to convert that to a time.  If I ever get to that part of the system I'll see if its possible to store end time, or the timespan as a number.

    CallType is a lookup table - and CallTypeId is a foreign key created automatically by some relationship modelling software - again non under my control and irrelevant to the problem.

    And while you're marking my question like some schoolkid's homework 😀 - maybe improve the way you present your comments, I probably missed half of what you said because of repetition and poor formatting. :crazy:

    Sometimes Mr. Celko actually says something worthwhile, just doesn't seem to happen too often.  Also, he wears sunglasses at night.  Just ignore most of what he says, but watch for those few tidbits that actually make sense.

  • Luis Cazares - Wednesday, February 28, 2018 8:50 AM

    drew.allen - Wednesday, February 28, 2018 8:42 AM

    Luis Cazares - Wednesday, February 28, 2018 7:41 AM

    Your query looks so overwhelming when it can be so simple:

    SELECT CAST( td.CallDateTime AS DATE),
       COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1],
       COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2],
       COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3],
       COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4],
       COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5],
       SUM(DATEDIFF(MILLISECOND, 0,ISNULL(Duration,'00:00:00')))
    FROM #TmpData AS td
    GROUP BY CAST( td.CallDateTime AS DATE);

    EDIT: I made a mistake on the Types 4 and 5. I have corrected it and made it even simpler.

    You can make it even simpler.  Null values are ignored in SUM, so there is no reason to convert it to the additive identity, i.e., zero.


    SELECT CAST( td.CallDateTime AS DATE),
       COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1],
       COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2],
       COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3],
       COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4],
       COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5],
       SUM(DATEDIFF(MILLISECOND, 0,Duration))
    FROM #TmpData AS td
    GROUP BY CAST( td.CallDateTime AS DATE);

    Drew

    Except that in some cases, you might get a NULL instead of a zero. It's a matter of what you expect as part of the results.

    NULLs can have some really surprising effects on applications, especially when they can a NULL aggregation warning like both of those solutions do. 

    Warning: Null value is eliminated by an aggregate or other SET operation.

    Some applications will interpret such a return as an error and others may include it as a part of the output while others may use it as the first of two result sets and ignore the second result set.  My suggestion would be to always avoid producing such an advisory message.  Since you really don't want to do this by messing with ANSI WARNINGS settings, you have to do it all in the code even if it makes it a bit longer.

    Also, since this appears to be a reporting mechanism for CDRs (Call Detail Records) from a telephone system, I'll also state that the use of DATEDIFF will return an INT datatype which, speaking from experience, can quickly and easily be outstripped when aggregating millisecond call durations even though it'll handle nearly 26 days (25D:20:31:23.647, to be exact) of duration.

    With both of those thoughts in mind, here's the code that I recommend for this problem because, even though it's a bit longer, is a lot more bullet proof when it comes to the problems I've cited.

     SELECT  CallDate           = CAST( td.CallDateTime AS DATE)
            ,[1]                = SUM(CASE WHEN td.CallTypeId = 1 THEN 1 ELSE 0 END)
            ,[2]                = SUM(CASE WHEN td.CallTypeId = 2 THEN 1 ELSE 0 END)
            ,[3]                = SUM(CASE WHEN td.CallTypeId = 3 THEN 1 ELSE 0 END)
            ,[4]                = SUM(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 ELSE 0 END)
            ,[5]                = SUM(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 ELSE 0 END)
            ,[Total Duration]   = SUM(ISNULL(DATEDIFF_BIG(ms, 0,Duration),0))
       FROM #TmpData AS td
      GROUP BY CAST( td.CallDateTime AS DATE)
    ;

    @ Tom257 ... the method that all of us have used is an ancient "Black Arts" form of T-SQL known as a CROSS TAB.  Microsoft used to include it in Books Online until they introduced the PIVOT function.  Here are a couple of articles about CROSS TABs and why (IMHO) CROSS TABs are better than PIVOT.

    http://www.sqlservercentral.com/articles/T-SQL/63681/
    http://www.sqlservercentral.com/articles/Crosstab/65048/

    @ Joe Celko ... You really need to buy a good book on T-SQL and another on the typical functionality of telephone systems.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Tom257 - Thursday, March 1, 2018 1:02 PM

    >> Wow Joe, thanks for the thorough treatise! <<

    I am always pedantic 🙂

    >> In answer: The original table does have a key, I omitted it as part of the simplification process as it plays no part in the output, as I omitted about 30 other columns from the original table. <<

    Wrong. The key is part of the very definition of a table and omitting it is an incredibly serious flaw. It says you regarding the relational database as it was a deck of punch cards with no constraints, no schema, etc. stop doing this!

    >> The data presented is intended to isolate the specific problem I was running into, which enabled Luis Cazares to immediately see where I was going wrong. <<

    Actually Luis was able to successfully guess at what you might have meant thanks to his having a lot of experience with bad postings. About 20 something years ago, I would've made the same guesses. But I got tired of doing that after a few decades. So I stop guessing and started correcting that

    >> The Duration column - You're right, it's not good to store a timespan in a time column - as you can see by converting to milliseconds for the SUM operation - but once again, the data is already there, and it doesn't include an end time. I guess that underlying this some physical device records a number of milliseconds or ticks and some *bright* developer decided to convert that to a time. If I ever get to that part of the system I'll see if its possible to store end time, or the timespan as a number. <<

    The problem with using a duration is that it's always wrong, because time is a moving thing. If you use the (start timestamp, COALSECE (end timestamp, CURRENT_TIMESTAMP)) idiom, the date range will always be correct at the time of invocation.

    >> call_types is a lookup table - and call_type_id is a foreign key created automatically by some relationship modelling software - again not under my control and irrelevant to the problem. <<

    There's a good rule of thumb that if the values in such a lookup table are relatively small (whatever that currently means) and static (whatever that means), then use a CHECK( .. IN ..) constraint. The optimizer will be able to immediately use this and not have to do a table join. I'm going to assume you have fewer than 100 call types, so this would fit into that model. However, if this lookup has lots of values, or is dynamic, then use a REFRENCES clause to establish the relationship in the schema. The worst thing you can see a somebody actually using outer joins.

    However, there is no such thing as <something>_type_id in RDBMS. The data element can be a type, such as a blood type, employee type, etc. or it can be a "something in particular" kind of identifier. . If you like reading dull, boring standards look at the made metadata standards committee rules and the ISO 11179 standards. Those postfixes are called "attribute properties", and an attribute can have one and only one of them.

    >> And while you're marking my question like some schoolkid's homework BigGrin - maybe improve the way you present your comments, I probably missed half of what you said because of repetition and poor formatting. <<

    Yeah, I know I'm having trouble with my dictation program. I had to stop typing because my hands are going. It actually takes longer to train the software that I would have thought, but I do a lot of technical stuff. Maybe I should have stuck to do in those Harlequin romances… Someday you'll also have 45 or 50 years in this game, and your hands will be an arthritic mess too 🙁

    Please post DDL and follow ANSI/ISO standards when asking for help. 

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply