Calculate average through recursive CTE

  • I am new with CTE and need help from you guys.

    I have a table FeedFileDetails which stores all the zipfile properties data (like, pakagename, universe, zipfilesize, zipfiledate..) I am using the below code to calculate the sum and average of size of similar zipfiles for past 14 days. I have hardcoded the number 14 for the calculation of average for past 14 days but now want this to be dynamically updated (just like counter).

    The problem is if sometimes the file is not generated for a particular day then the sum of the zipfiles will be for 13 days but the average will be wrong (as the denominator is hardcoded as 14 in the code).If there are only 13 files generated in the past 14 days then the denominator should be 13 not 14.

    DECLARE @current_date DATETIME

    DECLARE @previous_date DATETIME

    DECLARE @curr_fourteen DATETIME

    SELECT @current_date = MAX(filedate) FROM FeedFileDetails

    SELECT @previous_date = MAX(filedate)-1 FROM FeedFileDetails

    SELECT @curr_fourteen = @current_date-14

    ;WITH dates(DateValue)

    AS(

    SELECT @curr_fourteen AS DateValue

    UNION ALL

    SELECT DATEADD(DAY,1,DateValue)

    FROM dates

    WHERE DateValue < @previous_date

    ),

    sumofsize(pkgname,universe,sum_zipfilesize)

    AS(

    SELECT DISTINCT f.pkgname, f.universe, SUM(f.zipfilesize)

    FROM FeedFileDetails f INNER JOIN dates d

    ON d.DateValue = f.filedate

    GROUP BY f.pkgname, f.universe

    ),

    finalTable(pkgname,universe,filedate,ZipFileSize,sum_zipfilesize,avg_zipfilesize)

    AS(

    SELECT DISTINCT a.pkgname

    ,a.universe

    ,@current_date AS filedate

    ,f.zipfilesize AS ZipFileSize

    ,sum_zipfilesize

    ,sum_zipfilesize/14 as avg_zipfilesize

    FROM sumofsize a INNER JOIN FeedFileDetails f

    ON f.pkgname = a.pkgname and f.universe = a.universe

    WHERE f.filedate = @previous_date

    )

  • i think you can get count of filename like you derived SUM(f.zipfilesize) ,and use it in final CTE finalTable

  • As Megha said, grab the count of your files.

    --You only need one DECLARE statement

    -- and you can assign the values within that statement too.

    -- Saves on typing

    DECLARE @current_date DATETIME = (SELECT MAX(filedate) FROM FeedFileDetails),

    @previous_date DATETIME = (SELECT MAX(filedate)-1 FROM FeedFileDetails),

    @curr_fourteen DATETIME = (SELECT MAX(filedate)-14 FROM FeedFileDetails),

    @AvgFileCnt INT;

    SET @AvgFileCnt = (SELECT COUNT(FileDate) FROM FeedFileDetails

    WHERE CONVERT(CHAR(10),FileDate,101)

    >= CONVERT(CHAR(10),@current_date - 14,101));

    --Grabs the number of files where the file date is 14 or less days

    This code is only partially tested because I don't have a setup like you do. So make sure to test it before pushing it to any production environment.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Count of files doesn't help as there can be multiple files generated in a day.

  • Then what are you averaging by?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • averaging by the number of days on which the files were generated.

    Note : There can be one or more days when no files are generated.

    Lets take an example:

    If there are total 35 files generated in past 14 days then AVG function will take the sum of all the 35 files and will divide with 35. Instead I want it to divide it by the number of days i.e. 14.

    One case could be if there are 40 files generated in past 14 days but no files were generated on a particular day then the logic should calculate the sum of 40 files but divide it by the number of days on which the files were generated i.e. 13 instead of 14 (since files were not generated for a day ).

    Let me know if I am still unclear.

  • It would help if you would post the DDL (CREATE TABLE statement) for the table(s) involved in your query, some sample data for the table(s) (as INSERT INTO statements), expected results based on the sample data, and most importantly all the code for your query (if you look closely at your initial post you are missing the last part of your query).

    I think I understand what it is you are trying to accomplish but we need the additional info to really provide you a good, tested answer.

  • If you throw a DISTINCT into the code I provided, it solves your problem.

    But as Lynn said, we can't provide you with a definite answer without some DDL, sample data, and the rest of your query.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (4/25/2014)


    --You only need one DECLARE statement

    -- and you can assign the values within that statement too.

    [font="Arial Black"]-- Saves on typing[/font]

    "Sure you are? Dark Side powerful. Lure you it will."

    😉

    --===== Conditionally drop the Test Table to make reruns in SSMS easier

    IF OBJECT_ID('tempdb..#FeedFileDetails','U') IS NOT NULL

    DROP TABLE #FeedFileDetails

    ;

    --===== Create and populate the Test Table on-the-fly

    SELECT TOP 100000

    FileDate = DATEADD(yy,'2010'-1900,RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'2010','2014'))

    INTO #FeedFileDetails

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    ;

    GO

    PRINT '--===== "Save typing" method (221 characters NOT including spaces/tabs/print/stats/go)'

    SET STATISTICS TIME,IO ON;

    DECLARE @current_date DATETIME = (SELECT MAX(filedate) FROM #FeedFileDetails),

    @previous_date DATETIME = (SELECT MAX(filedate)-1 FROM #FeedFileDetails),

    @curr_fourteen DATETIME = (SELECT MAX(filedate)-14 FROM #FeedFileDetails),

    @AvgFileCnt INT;

    SET STATISTICS TIME,IO OFF;

    GO

    PRINT '--===== Traditional method (207 characters NOT including spaces/tabs/print/stats/go)'

    --{hold-alt, click-drag, paste} does most of the rest.

    SET STATISTICS TIME,IO ON;

    DECLARE @current_date DATETIME,

    @previous_date DATETIME,

    @curr_fourteen DATETIME,

    @AvgFileCnt INT

    ;

    SELECT @current_date = MAX(FileDate),

    @previous_date = MAX(FileDate)-1,

    @curr_fourteen = MAX(FileDate)-14

    FROM #FeedFileDetails;

    SET STATISTICS TIME,IO OFF;

    GO

    --===== "Save typing" method (221 characters NOT including spaces/tabs/print/stats/go)

    Table '#FeedFileDetails____________________________________________________________________________________________________00000000094B'.

    Scan count 3, logical reads 636, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 109 ms, elapsed time = 102 ms.

    --===== Traditional method (207 characters NOT including spaces/tabs/print/stats/go)

    Table '#FeedFileDetails____________________________________________________________________________________________________00000000094B'.

    Scan count 1, logical reads 212, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 31 ms, elapsed time = 31 ms.

    "Type once. Rest Force builds, it will". 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/25/2014)


    Brandie Tarvin (4/25/2014)


    --You only need one DECLARE statement

    -- and you can assign the values within that statement too.

    [font="Arial Black"]-- Saves on typing[/font]

    "Sure you are? Dark Side powerful. Lure you it will."

    😉

    Come to the Dark Side. We have cookies!

    "Type once. Rest Force builds, it will". 🙂

    YES! I OWN YOU NOW! :w00t:

    EDIT: Of course, I replied to the geekdomnis of your reply before reading the stats. That's interesting... I probably need to do more of those checks when I'm coding. Thanks for pointing that out.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Would be nice if we could hear more from the OP. It seems like he just disappeared.

  • Brandie Tarvin (4/28/2014)


    Come to the Dark Side. We have cookies!

    Heh... Mongo like cookies! 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey Guys,

    I got the solution. It was just adding a count of distinct filedate and then dividing the total filesize with that. Very bad on my part that I had to ask such a silly question.

    Btw, thanks for all your help and efforts. 🙂

    sumofsize(pkgname,universe,sum_zipfilesize, cntDays)

    AS(

    SELECT f.pkgname, f.universe, SUM(f.zipfilesize),

    COUNT(Distinct(FileDate)) as cntDays

    FROM FeedFileDetails f INNER JOIN dates d

    ON d.DateValue = f.filedate

    GROUP BY f.pkgname, f.universe

    ),

    finalTable(pkgname,universe,filedate,ZipFileSize,sum_zipfilesize,avg_zipfilesize)

    AS(

    SELECT a.pkgname

    ,a.universe

    ,@current_date AS filedate

    ,f.zipfilesize AS ZipFileSize

    ,sum_zipfilesize

    ,sum_zipfilesize/cntDays as avg_zipfilesize -- the name might have confused here; it should have been Average Total Size per day)

    FROM sumofsize a INNER JOIN FeedFileDetails f

    ON f.pkgname = a.pkgname and f.universe = a.universe

    WHERE f.filedate = @previous_date

  • not disappeared yet.. was just out of reach of internet 😛

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply