UDA and SortedDictionary class problem

  • michal.lisinski

    SSCertifiable

    Points: 6933

    I've got small problem with implementing SortedDictionary class in my aggregate.

    In the first version I took simple List<t> class and it works fine, but it consumes a lot of memory to store

    and sort all values. So I considered using a SortedDictionary instead of a List and track the number of occurrences instead of storing each element, it should significantly reduce memory consumption.

    I added condition where source value is stored as a key and number of occurrence as a value in Accumulate():

    ..

    public void Accumulate(SqlDouble value, SqlDouble percentile)

    {

    if (!value.IsNull)

    {

    if (this.SrtDict.ContainsKey(value.Value))

    this.SrtDict[value.Value] = this.SrtDict[value.Value] + 1;

    else

    this.SrtDict.Add(value.Value, 1);

    }

    ...

    Everything looks fine but it doesn't work, I check the sum of all occurrences in Terminate method and

    when I test it with five different values, it returns result:5

    SELECT [dbo].[aggTest](inp.ins,0.5) AS Result

    FROM ( SELECT 20.37 AS ins UNION ALL

    SELECT 15.45 AS ins UNION ALL

    SELECT 1007.23 AS ins UNION ALL

    SELECT 11.03 AS ins UNION ALL

    SELECT 12.25

    ) AS inp

    GO

    But when I duplicate one or more values (eg. 15.45 instead of 12.25), it returns result: 4 (or less).

    I built small console app to checked it, works perfectly, so where I screwed up something in the aggregate?

    Br,

    Mike

    P.S. Script attached.

  • Solomon Rutzky

    SSCoach

    Points: 16129

    michal.lisinski (7/18/2016)


    But when I duplicate one or more values (eg. 15.45 instead of 12.25), it returns result: 4 (or less).

    I built small console app to checked it, works perfectly, so where I screwed up something in the aggregate?

    There are two potential issues with the code that I can see:

    1) In the SqlUserDefinedAggregate attribute you have IsInvariantToDuplicates set to true when I believe it should be false. Try changing this one thing first to see if it helps.

    2) In your Write and Read methods you have discarded the Value part of the collection. So when you have duplicate values and the Accumulate method increments Value, that gets lost and reset to 1. Hence doing a .Sum() would add up four elements of 1 instead of three elements of 1 plus one element of 2.

    Hope that helps, Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • michal.lisinski

    SSCertifiable

    Points: 6933

    Hi Solomon,

    thanks for your help, I checked IsInvariantToDuplicates first and it didn't help,

    but I expected that, because it worked well in previous version where I used .NET List<T>.

    So it confirmed my assumptions that the problem exist in serialization.

    I added the Value part to write out each value in the SortedDictionary, but

    now it returns very strange result, sum of all values doesn't give integer.

    I've found some articles with warnings that the interface IDictionary is not serializable for

    some unknown reason so all the classes that implement the IDictionary interface (SortedDictionary,

    Dictionary ) can not be serialized in simple way, so I wonder if the Read() and Write() methods of the IBinarySerialize interface are used to deserialize and serialize work with SortedDictionary?

    Br,

    Mike

  • Solomon Rutzky

    SSCoach

    Points: 16129

    michal.lisinski (7/19/2016)


    So it confirmed my assumptions that the problem exist in serialization.

    This is what I was saying in #2. You aren't fully serializing the collection, just the keys. The values are all hard-coded to be 1.

    I added the Value part to write out each value in the SortedDictionary, but now it returns very strange result, sum of all values doesn't give integer.

    You need to attach your updated code. Also, are you testing with the exact same values as before? There is a code path in the Terminate method that will pick the Min or Max value instead of doing a Sum. Please also post your test values if they are different from before.

    I've found some articles with warnings that the interface IDictionary is not serializable for some unknown reason so all the classes that implement the IDictionary interface (SortedDictionary, Dictionary ) can not be serialized in simple way, so I wonder if the Read() and Write() methods of the IBinarySerialize interface are used to deserialize and serialize work with SortedDictionary?

    I don't see any issue with the SortedDictionary since you aren't serializing it. Nor is it being auto serialized. You wrote the serialization code in the Write method. If there is no issue with the Terminate method or the test data, then perhaps look at how you are serializing. I am not sure if relying upon the stream length is an issue. It might not be, but I have never tried that. I have always written the number of elements in the collection as one of the first items in the Write method, and then I read it first in the Read method and do a for loop to read that number of elements from the stream.

    Still, I am not sure why you are using a SortedDictionary. Is that required for doing Min and Max? If not, then perhaps try a regular Dictionary?

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply