how to filter out zeros when calculating average

  • hi everyone

    I have this query that needs a bit of modification:

    SELECT T3.SYMBOL,
    T3.TRADE_DATE,
    (AB30_C + AB30_P + AB30_G + AB30_R + AB30_Y) / 5.0 AS AB30,
    (AB60_C + AB60_P + AB60_G + AB60_R + AB60_Y) / 5.0 AS AB60,
    (AB110_C + AB110_P + AB110_G + AB110_R + AB110_Y) / 5.0 AS AB110
    FROM(
    sub-query
    ) AS T3

    I am calculating averages of 5 columns.  It is possible that one field may have a 0 and if it does then I want to exclude it from the calculation.  Is there a way to filter out zeros when calculating the average in the select clause?  The only way I know how to do it is to add a where clause that filters out the zeros but if that happens then the entire row is removed for that particular trade date so that means that there is no value for that particular trade date. I don't want the entire row to be omitted because one of the fields was 0.  There are 5 fields. If one of them is 0 and the rest are non-zero then take the average of the 4 values.   If there are 2 zeros then use 3 to calculate the average etc.  Is there a way to do this without touching the sub-query?  The sub query calculates the ABXXX fields and the logic is super complicated so I don't want to touch it b/c it may break but if that is the only/easier way to address the average calculation then I am open to that too.  Maybe the subquery calculations can return NULL if the calculation produces a zero?

    Any suggestions are much appreciated.

    Thank you

     

    • This topic was modified 1 month, 3 weeks ago by  water490.
    • This topic was modified 1 month, 3 weeks ago by  water490.
  • You could use sums of case statements to get the count:

    WITH subquery as 
    (
    SELECT 
    'A' AS SYMBOL, CAST(GETDATE() AS DATE) AS TRADE_DATE,
    0 AS AB30_C, 1 AS AB30_P, 1 AS AB30_G, 1 AS AB30_R, 1 AS AB30_Y, 
    1 AS AB60_C, 1 AS AB60_P, 1 AS AB60_G, 1 AS AB60_R, 0 AS AB60_Y,
    0 AS AB110_C , 1 AS AB110_P, 1 AS AB110_G, 1 AS AB110_R, 1 AS AB110_Y
    )
    SELECT SYMBOL, TRADE_DATE,
    (AB30_C + AB30_P + AB30_G + AB30_R + AB30_Y)/(case when AB30_C <> 0 then 1 else 0 end +
          case when AB30_P <> 0 then 1 else 0 end +
          case when AB30_G <> 0 then 1 else 0 end +
          case when AB30_R <> 0 then 1 else 0 end +
          case when AB30_Y <> 0 then 1 else 0 end
      ) AS AB30,
    (AB60_C + AB60_P + AB60_G + AB60_R + AB60_Y)/(case when AB60_C <> 0 then 1 else 0 end +
          case when AB60_P <> 0 then 1 else 0 end +
          case when AB60_G <> 0 then 1 else 0 end +
          case when AB60_R <> 0 then 1 else 0 end +
          case when AB60_Y <> 0 then 1 else 0 end
      ) AS AB60,
    (AB110_C + AB110_P + AB110_G + AB110_R + AB110_Y)/(case when AB110_C <> 0 then 1 else 0 end +
          case when AB110_P <> 0 then 1 else 0 end +
          case when AB110_G <> 0 then 1 else 0 end +
          case when AB110_R <> 0 then 1 else 0 end +
          case when AB110_Y <> 0 then 1 else 0 end
      ) AS AB110
    FROM subquery;

    or cross apply summed values:

    WITH subquery as 
    (
    SELECT 
    'A' AS SYMBOL, CAST(GETDATE() AS DATE) AS TRADE_DATE,
    0 AS AB30_C, 1 AS AB30_P, 1 AS AB30_G, 1 AS AB30_R, 1 AS AB30_Y, 
    1 AS AB60_C, 1 AS AB60_P, 1 AS AB60_G, 1 AS AB60_R, 0 AS AB60_Y,
    0 AS AB110_C , 1 AS AB110_P, 1 AS AB110_G, 1 AS AB110_R, 1 AS AB110_Y
    )
    SELECT subquery.SYMBOL, subquery.TRADE_DATE,
    (AB30_C + AB30_P + AB30_G + AB30_R + AB30_Y) / c30.non_zeros AS AB30,
    (AB60_C + AB60_P + AB60_G + AB60_R + AB60_Y) / c60.non_zeros AS AB60,
    (AB110_C + AB110_P + AB110_G + AB110_R + AB110_Y) / c110.non_zeros AS AB110
    FROM subquery
    cross apply
         (select count(*) as non_zeros
          from (values (AB30_C), (AB30_P), (AB30_G), (AB30_R), (AB30_Y)) v(h)
          where h <> 0) c30
    cross apply
         (select count(*) as non_zeros
          from (values (AB60_C), (AB60_P), (AB60_G), (AB60_R), (AB60_Y)) v(h)
          where h <> 0) c60
    cross apply
         (select count(*) as non_zeros
          from (values (AB110_C), (AB110_P), (AB110_G), (AB110_R), (AB110_Y)) v(h)
          where h <> 0) c110;

     

     

     


  • SELECT T3.SYMBOL,
    T3.TRADE_DATE,
    (AB30_C + AB30_P + AB30_G + AB30_R + AB30_Y) / NULLIF(SUM(CASE WHEN AB30_C = 0 THEN 0 ELSE 1 END + CASE WHEN AB30_P = 0 THEN 0 ELSE 1 END +
    CASE WHEN AB30_G = 0 THEN 0 ELSE 1 END + CASE WHEN AB30_R = 0 THEN 0 ELSE 1 END + CASE WHEN AB30_Y = 0 THEN 0 ELSE 1 END), 0) AS AB30,
    ...similarly for the other two totals...
    FROM(
    sub-query
    ) AS T3

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • thank you both!  i know how to fix my query now because of your feedback.  thank you again for the help!!

  • For the denominator you could sum SIGN(ABS()) functions.  The SIGN function returns -1, 0, or 1 depending on if the input is negative, zero, or positive.  The ABS function returns the absolute value of the input

    with five_cte(a, b, c, d, e) as (
    select 1, 2, 3, 4, -5
    union all
    select 1, 2, 3, 4, 0
    union all
    select 0, 2, 3, 4, 0)
    select *, (a+b+c+d+e)*1.0/(sign(abs(a))+sign(abs(b))+sign(abs(c))+sign(abs(d))+sign(abs(e))) calc_sign
    from five_cte;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • water490 wrote:

    thank you both!  i know how to fix my query now because of your feedback.  thank you again for the help!!

    Cool.  Can you show us what you settled on?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Based on ratbak's second query, I think you can simplify it a little more by using a combination of AVG and NULLIF:

    WITH subquery AS 
    (
    SELECT
    'A' AS SYMBOL,
    CAST(GETDATE() AS DATE) AS TRADE_DATE,
    0 AS AB30_C,
    10 AS AB30_P,
    15 AS AB30_G,
    21 AS AB30_R,
    13 AS AB30_Y, -- = 59 / 4 = 14.75

    1 AS AB60_C,
    16 AS AB60_P,
    18 AS AB60_G,
    51 AS AB60_R,
    90 AS AB60_Y, -- = 176 / 5 = 35.2

    0 AS AB110_C,
    11 AS AB110_P,
    10 AS AB110_G,
    1 AS AB110_R,
    0 AS AB110_Y -- = 22 / 3 = 7.333333
    )


    SELECT
    sq.SYMBOL,
    sq.TRADE_DATE,
    c30.AB30_Y,
    c60.AB60_Y,
    c110.AB110_Y
    FROM
    subquery AS sq
    CROSS APPLY
    (SELECT AVG(NULLIF(CAST(h AS FLOAT),0)) AS AB30_Y
    FROM (VALUES (AB30_C), (AB30_P), (AB30_G), (AB30_R), (AB30_Y)) v(h)
    ) AS c30
    CROSS APPLY
    (SELECT AVG(NULLIF(CAST(h AS FLOAT),0)) AS AB60_Y
    FROM (VALUES (AB60_C), (AB60_P), (AB60_G), (AB60_R), (AB60_Y)) v(h)
    ) AS c60
    CROSS APPLY
    (SELECT AVG(NULLIF(CAST(h AS FLOAT),0)) AS AB110_Y
    FROM (VALUES (AB110_C), (AB110_P), (AB110_G), (AB110_R), (AB110_Y)) v(h)
    ) AS c110
    ;

    The result is what I would expect, given the data that I put in the "subquery":

    SYMBOL TRADE_DATE AB30_Y                 AB60_Y                 AB110_Y
    ------ ---------- ---------------------- ---------------------- ----------------------
    A 2024-03-07 14,75 35,2 7,33333333333333
  • Jeff Moden wrote:

    water490 wrote:

    thank you both!  i know how to fix my query now because of your feedback.  thank you again for the help!!

    Cool.  Can you show us what you settled on?

    i used ScottPletcher's approach.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply