Aggregate Function Product()

  • Comments posted to this topic are about the item Aggregate Function Product()

  • I hate to be the party pooper but change one value to a negative value and try again.

    Also see http://weblogs.sqlteam.com/peterl/archive/2008/11/19/How-to-get-the-productsum-from-a-table.aspx


    N 56°04'39.16"
    E 12°55'05.25"

  • Maybe this to account for signs . . .

    EXP(SUM(LOG(ABS([RowValue])))) * SIGN(CHECKSUM_AGG([RowValue]))

  • The method in the article is only good for positive (>0) values. Here is a formula that works for zeroes and negatives as well:

    COALESCE(EXP(SUM(LOG(ABS(NULLIF(val, 0))))),0) * (1 - 2 * (COUNT(CASE WHEN val < 0 THEN 1 END) % 2))

    The first part uses NULLIF to replace 0 with NULL, ABS to change negative values to positive ones, then pours that into the log-based product formula (except that this one uses natural logarithms instead of 10-based logarithms). Because of NULL propagation, a single 0 in the input set (which is converted to NULL by NULLIF) will cause the result to be NULL; the COALESCE function fixes this by replacing NULL with 0. The result of this first part is the product of the absolute value of all numbers.

    The second part of the formula counts the number of negative values in the input set, then calculates the remainder after division by 2 - resulting in 0 for an even number of negatives, and 1 for an odd number. This value is multiplied by 2 and subtracted from 1, resulting in 1 if there's an even number of negatives, and -1 if there's an odd number. Multiply this by the product of the absolute values to get the final answer.

    I first saw this technique in a book by Itzik Ben-Gan (who else?). I now had to google for it, and found my first hit in a comment by Rob Farley to this blog post by Michael Coles - though I did change and simplify Rob's version a bit - he had omitted the COALESCE and instead used a count of the number of zero values - a technique that I don't understand and that, frankly, probably doesn't work.

    One problem with this approach is that there is no way to distinguish a NULL that was introduced by the ISNULL (and hence was a 0 at first) from a NULL that was already NULL in the input set, so the result will be 0 if one of the input values is NULL, whereas the proper result would be either NULL (if you want to include the NULL input in the calculation), or the product of the non-NULL values (if you want the aggregate to work like all other aggregates do). I guess the best way to fix this would be to exclude NULL values in the WHERE clause, or to add some extra CASE expressions.

    I als found a more readable version that uses the same technique, but broken down into steps, in a reply to this post on stackoverflow. This reply was posted by gbn.

    SELECT

    GrpID,

    CASE

    WHEN MinVal = 0 THEN 0

    WHEN Neg % 2 = 1 THEN -1 * EXP(ABSMult)

    ELSE EXP(ABSMult)

    END

    FROM

    (

    SELECT

    GrpID,

    --log of +ve row values

    SUM(LOG(ABS(NULLIF(Value, 0)))) AS ABSMult,

    --count of -ve values. Even = +ve result.

    SUM(SIGN(CASE WHEN Value < 0 THEN 1 ELSE 0 END)) AS Neg,

    --anything * zero = zero

    MIN(ABS(Value)) AS MinVal

    FROM

    Mytable

    GROUP BY

    GrpID

    ) foo

    This version is more readable, but has the same problem with NULL versions as the compact version above.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • I just saw the answer by Mike. I like the more compact way to get the multiplication factor (CHECKSUM_AGG), but I'm also a bit concerned - I could not find any documentation (at least not in the 30 seconds I spent searching -I know, pathetic!-) that this aggregate guarantees a negative result if the number of negative input values is odd, and a positive result otherwise. Without such documentation, this method appears to be less safe.

    And a final comment regarding the original article that I forgot in my previous post:

    "Notice the decimal point after the first argument of the power function. You need it to force a precession of 18 so that 4 * 2 equal 8 instead of 7."

    The decimal point does not force a precision of 18, it forces the use of non-integer (numeric(5,2) to be precise). Without this, all numbers after the decimal place are cut off. That's why most versions of this algorithm on the internet use natural logarithm rather than 10-based logarithm - that one always uses non-integer calculation.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • We use a CLR aggregate for this.

  • I'd be interested to know what problems require the calculation of a product from each row in a query in this way.

    Presumably also, you'd need to be aware of the number of values and size of values that are expected - it wouldn't take very many numbers multiplied together for the product to get very large. (Or very small, if the numbers being multiplied are between -1 and 1 )

  • archie flockhart (2/6/2012)


    I'd be interested to know what problems require the calculation of a product from each row in a query in this way.

    Calculating correlation statistics

  • David.Poole (2/6/2012)


    archie flockhart (2/6/2012)


    I'd be interested to know what problems require the calculation of a product from each row in a query in this way.

    Calculating correlation statistics

    It's been a long time since I did any correlations but I can't remember needing to get a product of all the numbers in a dataset, and a quick google for the formulas didn't throw up anything that required Product( X1 .. Xn)

    Can you point me at more details ?

  • archie flockhart (2/6/2012)


    Presumably also, you'd need to be aware of the number of values and size of values that are expected - it wouldn't take very many numbers multiplied together for the product to get very large. (Or very small, if the numbers being multiplied are between -1 and 1 )

    If I ever need to do this for real, I'll amost certainly add a CAST to float, to make sure that results of virtually all sizes can be represented.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • And another way to deal with 0 and negatives it if you don't want to use a 'case' statement :

    MIN(ABS(SIGN(num))) * (1-(SUM(SIGN(num) * (SIGN(num) -1))%4)) * POWER(10.,SUM(LOG10(ABS(Num) + 1 - ABS(SIGN(num)))))

  • SQL Kiwi (2/6/2012)


    We use a CLR aggregate for this.

    That's definitely cleaner, and easier to understand code. But how does the performance of a CLR aggregate compare to performance of a complicated formula using native functions only? Have you ever done any perf testing?


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Stephen.Rice (2/6/2012)


    And another way to deal with 0 and negatives it if you don't want to use a 'case' statement :

    MIN(ABS(SIGN(num))) * (1-(SUM(SIGN(num) * (SIGN(num) -1))%4)) * POWER(10.,SUM(LOG10(ABS(Num) + 1 - ABS(SIGN(num)))))

    It's a real challenge to decipher what it does and how it works (I would never allow this in production code without extensive comments!) - but I like how it manages to avoid all explicit CASE expressions, and all implicit CASE expressions (NULLIF, COALESCE) as well. I'm not sure if it's true, but I suspect that CASE can be a tad slower than plain arithmetical functions.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Sorry! I definitely should have added some comments.

    The first section deals with zeros. ABS(SIGN()) for minus numbers or positive numbers being 1 and zero being 0 so the MIN will be 0 if any zeros exist in the input set.

    The next section deals with negatives. SIGN(Num) * (SIGN(NUM)-1) yields 0 for positive or zero numbers and 2 for negative numbers. So we add them up and take a mod 4 to give 2 when there are an odd number of negatives or 0 where there are an even number.

    An even number in the aggregate leaves an even result so we need to leave alone. An odd number turns the product negative. So by taking 1- (the above) we get 1 for all positive numbers or even numbers of negatives or -1 for odd numbers of negatives.

    Finally we just need to prevent the LOG function from throwing an error if we pass in a zero. We can do that by adding 1 and taking off the ABS(SIGN()) of the number. This does not change the result for any non zero number but alters zeros to ones (which prevents an error being thrown).

    As any product with a zero will yield zero (and we deal with that case in the first clause) it doesn't matter that we 'muck up' with the log here.

    Stephen

  • It is useful..!!

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply