Standard Deviation (variance) using columns in same row.

  • Is there a way to get a variance from two columns in the same row? 

    I have a row that has two numbers (current and previous) and I want to get the variance.


    DECLARE @test-2 TABLE
    (
     EmpName varchar(50),
     Deduction MONEY,
     YearMonth int
    )

    DECLARE @CurrentYearMonth INT,
      @PreviousYearMonth INT

    SELECT @CurrentYearMonth = 201805, @PreviousYearMonth = 201804

    INSERT @test-2 Values('John', 259.05, 201803)
    INSERT @test-2 Values('John', 325.50, 201804)
    INSERT @test-2 Values('John', 325.50, 201805)
    INSERT @test-2 Values('Mary', 410.10, 201804)
    INSERT @test-2 Values('Mary', 380.25, 201805)

    SELECT EmpName,
           Deduction,
           (
               SELECT Deduction
               FROM @test-2 t2
               WHERE t1.EmpName = t2.EmpName
                     AND YearMonth = @PreviousYearMonth
           ) AS PreviousDeduction
    FROM @test-2 t1
    WHERE YearMonth = @CurrentYearMonth

    I get the following result:

    

    I would like to have a column next to the PreviousDeduction that has the variance.

    Thanks,

    Tom

  • Yes, do an "UNPIVOT" first.  I'm using CROSS APPLY to do the "UNPIVOT".

    SELECT EmpName,
    Deduction,
    (
    SELECT Deduction
    FROM @test-2 t2
    WHERE t1.EmpName = t2.EmpName
    AND YearMonth = @PreviousYearMonth
    ) AS PreviousDeduction
    FROM @test-2 t1
    CROSS APPLY ( VALUES(t1.Deduction), (Previous_Deduction.Deduction)) d(Deduction)
    WHERE YearMonth = @CurrentYearMonth

    If your on a newer version of SQL Server, you can also improve this by using a windowed function.
    SELECT EmpName,
           Deduction,
        LAG(Deduction) OVER(PARTITION BY EmpName ORDER BY YearMonth) AS PreviousDeduction,
        VAR(Deduction) OVER(PARTITION BY EmpName ORDER YearMonth ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS Variance
    FROM @test-2 t1
    WHERE YearMonth = @CurrentYearMonth

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • tshad - Tuesday, July 31, 2018 2:47 PM

    Is there a way to get a variance from two columns in the same row? 

    I have a row that has two numbers (current and previous) and I want to get the variance.


    DECLARE @test-2 TABLE
    (
     EmpName varchar(50),
     Deduction MONEY,
     YearMonth int
    )

    DECLARE @CurrentYearMonth INT,
      @PreviousYearMonth INT

    SELECT @CurrentYearMonth = 201805, @PreviousYearMonth = 201804

    INSERT @test-2 Values('John', 259.05, 201803)
    INSERT @test-2 Values('John', 325.50, 201804)
    INSERT @test-2 Values('John', 325.50, 201805)
    INSERT @test-2 Values('Mary', 410.10, 201804)
    INSERT @test-2 Values('Mary', 380.25, 201805)

    SELECT EmpName,
           Deduction,
           (
               SELECT Deduction
               FROM @test-2 t2
               WHERE t1.EmpName = t2.EmpName
                     AND YearMonth = @PreviousYearMonth
           ) AS PreviousDeduction
    FROM @test-2 t1
    WHERE YearMonth = @CurrentYearMonth

    I get the following result:

    

    I would like to have a column next to the PreviousDeduction that has the variance.

    Thanks,

    Tom

    Can you kindly post the desired results?

    Saravanan

  • Something like:

    

    Thanks.

  • DECLARE @test-2 TABLE
    (
    EmpName varchar(50),
    Deduction MONEY,
    YearMonth int
    )
    DECLARE @CurrentYearMonth INT,
    @PreviousYearMonth INT

    SELECT @CurrentYearMonth = 201805, @PreviousYearMonth = 201804

    INSERT @test-2 Values('John', 259.05, 201803)
    INSERT @test-2 Values('John', 325.50, 201804)
    INSERT @test-2 Values('John', 325.50, 201805)
    INSERT @test-2 Values('Mary', 410.10, 201804)
    INSERT @test-2 Values('Mary', 380.25, 201805)

    SELECT EmpName, 
           Deduction,
           PreviousDeduction,
           POWER(Deduction-T.Average,2)+POWER(PreviousDeduction-T.Average,2) AS Variance
    FROM @test-2 t1
    CROSS APPLY(SELECT Deduction PreviousDeduction
         FROM @test-2 t2
         WHERE t1.EmpName = t2.EmpName
         AND YearMonth = @PreviousYearMonth
       ) AS PreviousDeduction
    CROSS APPLY(VALUES ((Deduction+PreviousDeduction)/2.0)) T(Average)
    WHERE YearMonth = @CurrentYearMonth


  • It looks like it works. Just not sure why.

    I understand what the cross applys are doing just not sure why a couple of things are for.

    For example, not sure what the T(Average) is in the second apply, but I see how it is used.

    The same as the alias PreviousDeduction is for in the 1st apply.

    I assume the 1st cross apply is used instead of the sub query, so it can be used later in the select statement.

    Thanks,

    Tom

  • tshad - Tuesday, July 31, 2018 8:52 PM

    It looks like it works. Just not sure why.

    I understand what the cross applys are doing just not sure why a couple of things are for.

    For example, not sure what the T(Average) is in the second apply, but I see how it is used.

    The same as the alias PreviousDeduction is for in the 1st apply.

    I assume the 1st cross apply is used instead of the sub query, so it can be used later in the select statement.

    Thanks,

    Tom

    T(Average) is getting the mean of Deduction, PreviousDeduction. This is used in the calculation of the Variance. As the average needs to be subtracted twice (once from each number) in the calculation of variance it saves space.
    Likewise with PreviousDeduction, this is used to calculate the average and used in the select.

  • I did get my query to work using this and I figured out how the calculation worked,

    I was confused on the syntax of the 2nd Cross Apply:

    CROSS APPLY(VALUES ((Deduction+PreviousDeduction)/2.0)) AS f(Average)

    Then the value is accessed by f.Average.  I had never seen it used that way.  I always use an alias (such as "a").  Is this because there is no name returned just a value?

    Also, I assume you used the Cross Apply so you would have immediate access, instead of using a derived table.

    Thanks

  • tshad - Wednesday, August 1, 2018 11:44 AM

    I did get my query to work using this and I figured out how the calculation worked,

    I was confused on the syntax of the 2nd Cross Apply:

    CROSS APPLY(VALUES ((Deduction+PreviousDeduction)/2.0)) AS f(Average)

    Then the value is accessed by f.Average.  I had never seen it used that way.  I always use an alias (such as "a").  Is this because there is no name returned just a value?

    Also, I assume you used the Cross Apply so you would have immediate access, instead of using a derived table.

    Thanks

    When you have CROSS APPLY with VALUES instead of a SELECT you need to use the T(C) method of generating a TableName.ColumnName, instead of giving the column alias next to the column name.
    Instead of:
    SELECT t1.EmpName,
           t1.Deduction,
           PreviousDeduction.PreviousDeduction,
           POWER(t1.Deduction-T.Average,2)+POWER(PreviousDeduction.PreviousDeduction-T.Average,2) AS Variance
    FROM @test-2 t1
    CROSS APPLY(SELECT t2.Deduction PreviousDeduction
                  FROM @test-2 t2
                 WHERE t2.EmpName = t1.EmpName
                   AND t2.YearMonth = @PreviousYearMonth) AS PreviousDeduction
    CROSS APPLY(VALUES ((t1.Deduction+PreviousDeduction.PreviousDeduction)/2.0)) T(Average)
    WHERE t1.YearMonth = @CurrentYearMonth

    You could write it without CROSS APPLY as:

    SELECT t1.EmpName,
           t1.Deduction,
           t2.PreviousDeduction,
           POWER(t1.Deduction-(t1.Deduction+t2.PreviousDeduction)/2.0,2)+POWER(t2.PreviousDeduction-(t1.Deduction+t2.PreviousDeduction)/2.0,2) AS Variance
      FROM @test-2 t1
    INNER JOIN (SELECT t2.Deduction PreviousDeduction,
                        t2.EmpName,
                        t2.YearMonth
                    FROM @test-2 t2) AS t2
             ON t1.EmpName = t2.EmpName
            AND t2.YearMonth = @PreviousYearMonth
    WHERE t1.YearMonth = @CurrentYearMonth

    or like this:
    SELECT t1.EmpName,
           t1.Deduction,
           t2.Deduction PreviousDeduction,
           POWER(t1.Deduction-(t1.Deduction+t2.Deduction)/2.0,2)+POWER(t2.Deduction-(t1.Deduction+t2.Deduction)/2.0,2) AS Variance
    FROM @test-2 t1
    INNER JOIN @test-2 t2
             ON t1.EmpName = t2.EmpName
            AND t2.YearMonth = @PreviousYearMonth
    WHERE t1.YearMonth = @CurrentYearMonth

  • tshad - Wednesday, August 1, 2018 11:44 AM

    Then the value is accessed by f.Average.  I had never seen it used that way.  I always use an alias (such as "a").  Is this because there is no name returned just a value?

    It helps to realise these two SELECTs are equivalent:

    Drop Table If Exists #Temp
    Create Table #Temp(value int)
    Insert Into #Temp Values (1),(2),(3)

    Select 
    T.value,
    CA.Doubled,
    CA.Trebled
    From #Temp T
    Cross Apply (Select value * 2 As Doubled, value * 3 As Trebled) As CA

    Select 
    T.value,
    CA.Doubled,
    CA.Trebled
    From #Temp T
    Cross Apply (Select value * 2, value * 3) As CA(Doubled,Trebled)


    In the latter, the columns are named as part of the alias rather than inline with the select. This is particularly useful for things like a VALUES clause where there isn't a way to name columns inline. You also see it a lot with things like XML expansion using the nodes function.
  • Okay, I'm totally confused.   Since when has the term "variance" come to mean something other than the difference for a set of exactly two values?   If we were talking about a set of 20 or 50 or 100 values, I'd start thinking standard deviation, but for exactly two values, what would a mathematical "variance" actually mean?   How could it be used, or for that matter, why?  Any reason to NOT use just the mere difference?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Monday, August 6, 2018 9:43 AM

    Okay, I'm totally confused.   Since when has the term "variance" come to mean something other than the difference for a set of exactly two values?   If we were talking about a set of 20 or 50 or 100 values, I'd start thinking standard deviation, but for exactly two values, what would a mathematical "variance" actually mean?   How could it be used, or for that matter, why?  Any reason to NOT use just the mere difference?

    If you have n items (X_1, X_2,..,X_n) the Variance is: SUM((X_i - Mean)^2)/(N-1) 
    The variance is a measure of the spread from the mean. The Standard Deviation is the square root of the variance.
    The "difference" gives a different answer to the "variance" and difference has no relation to the mean.

  • It can be done without using a self join, using just windowed functions:
    ;WITH CTE AS
    (
        SELECT t1.EmpName,
               t1.Deduction,
               t1.YearMonth,
               LAG(t1.YearMonth) OVER (PARTITION BY t1.EmpName ORDER BY t1.YearMonth) AS PreviousYearMonth,
               LAG(t1.Deduction) OVER (PARTITION BY t1.EmpName ORDER BY t1.YearMonth) AS PreviousDeduction,
               VAR(t1.Deduction) OVER(PARTITION BY t1.EmpName ORDER BY t1.YearMonth ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS Variance
          FROM @test-2 t1
    )
    SELECT CTE.EmpName,
            CTE.Deduction,
            CTE.PreviousDeduction,
            CTE.Variance
       FROM CTE
      WHERE CTE.YearMonth = @CurrentYearMonth
        AND CTE.PreviousYearMonth = @PreviousYearMonth

  • Jonathan AC Roberts - Monday, August 6, 2018 10:21 AM

    It can be done without using a self join, using just windowed functions:
    ;WITH CTE AS
    (
        SELECT t1.EmpName,
               t1.Deduction,
               t1.YearMonth,
               LAG(t1.YearMonth) OVER (PARTITION BY t1.EmpName ORDER BY t1.YearMonth) AS PreviousYearMonth,
               LAG(t1.Deduction) OVER (PARTITION BY t1.EmpName ORDER BY t1.YearMonth) AS PreviousDeduction,
               VAR(t1.Deduction) OVER(PARTITION BY t1.EmpName ORDER BY t1.YearMonth ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS Variance
          FROM @test-2 t1
    )
    SELECT CTE.EmpName,
            CTE.Deduction,
            CTE.PreviousDeduction,
            CTE.Variance
       FROM CTE
      WHERE CTE.YearMonth = @CurrentYearMonth
        AND CTE.PreviousYearMonth = @PreviousYearMonth

    You do realize that I offered a similar solution five days ago?  It does require that he be on SQL 2012, which is not likely given that he posted in a SQL 2008 forum.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Monday, August 6, 2018 1:37 PM

    Jonathan AC Roberts - Monday, August 6, 2018 10:21 AM

    It can be done without using a self join, using just windowed functions:
    ;WITH CTE AS
    (
        SELECT t1.EmpName,
               t1.Deduction,
               t1.YearMonth,
               LAG(t1.YearMonth) OVER (PARTITION BY t1.EmpName ORDER BY t1.YearMonth) AS PreviousYearMonth,
               LAG(t1.Deduction) OVER (PARTITION BY t1.EmpName ORDER BY t1.YearMonth) AS PreviousDeduction,
               VAR(t1.Deduction) OVER(PARTITION BY t1.EmpName ORDER BY t1.YearMonth ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS Variance
          FROM @test-2 t1
    )
    SELECT CTE.EmpName,
            CTE.Deduction,
            CTE.PreviousDeduction,
            CTE.Variance
       FROM CTE
      WHERE CTE.YearMonth = @CurrentYearMonth
        AND CTE.PreviousYearMonth = @PreviousYearMonth

    You do realize that I offered a similar solution five days ago?  It does require that he be on SQL 2012, which is not likely given that he posted in a SQL 2008 forum.

    Drew

    I did see that but it doesn't work as your WHERE clause restricts YearMonth to be @CurrentYearMonth. So it can't get the previous month.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply