Estimate a count

  • Comments posted to this topic are about the item Estimate a count

    Gregory A. Larsen, MVP

  • It's amazing to me that people will tolerate a built-in 2% error for just about anything.  It's just not that difficult to maintain an active total for "big data".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good question to showcase a new feature!

    From a BI/DS perspective, this is potentially very useful for data profiling.


    Just because you're right doesn't mean everybody else is wrong.

  • Rune Bivrin wrote:

    Good question to showcase a new feature!

    From a BI/DS perspective, this is potentially very useful for data profiling.

    Do you have a short example or suggestion of when data profiling can withstand a per element tolerance of +/- 2 percent other than polls?

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Do you have a short example or suggestion of when data profiling can withstand a per element tolerance of +/- 2 percent other than polls?

    Quite honestly, I can't be arsed to produce a specific example. However, it's pretty much standard procedure when devloping new integrations (or at least it should be) to do rough profiling in order to use relevant design patterns. Does this file type usually contain 100, 10.000 or 1.000.000 different customer id:s? That will potentially affect your approach, but +- 10 percent isn't that important.

    I do see it more as a development tool, rather than an analysis tool for the data scientist, but if the performance gains are substantial enough, it could very well be means to pinpoint areas that merit more detailed analysis.

    Much like wood worker can use both the 16-tooth circular saw and the Japanese pull saw, every tool has its place.


    Just because you're right doesn't mean everybody else is wrong.

  • Ok... now that I agree with.  Thanks for your time, Rune.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Pinal Dave found it was not very efficient... as always... it depends... test in your environment to be sure it works like you want...

    https://blog.sqlauthority.com/2019/12/11/sql-server-approx_count_distinct-not-always-efficient/

     

  • Agree with Rune. nice new tool for initial profiling, but not much use to Quants and the like.

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply