Estimate a count

Question

Estimate a count

Greg Larsen

SSC-Insane

Points: 20966
More actions
June 1, 2020 at 12:00 am

#3749028

Comments posted to this topic are about the item Estimate a count
Gregory A. Larsen, MVP

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004149 More actions · Answer 1

It's amazing to me that people will tolerate a built-in 2% error for just about anything. It's just not that difficult to maintain an active total for "big data".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Rune Bivrin SSCrazy Eights Points: 8483 More actions · Answer 2

Good question to showcase a new feature!

From a BI/DS perspective, this is potentially very useful for data profiling.

Just because you're right doesn't mean everybody else is wrong.

Jeff Moden SSC Guru Points: 1004149 More actions · Answer 3

Rune Bivrin wrote:

Good question to showcase a new feature!
From a BI/DS perspective, this is potentially very useful for data profiling.

Do you have a short example or suggestion of when data profiling can withstand a per element tolerance of +/- 2 percent other than polls?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Rune Bivrin SSCrazy Eights Points: 8483 More actions · Answer 4

Jeff Moden wrote:

Do you have a short example or suggestion of when data profiling can withstand a per element tolerance of +/- 2 percent other than polls?

Quite honestly, I can't be arsed to produce a specific example. However, it's pretty much standard procedure when devloping new integrations (or at least it should be) to do rough profiling in order to use relevant design patterns. Does this file type usually contain 100, 10.000 or 1.000.000 different customer id:s? That will potentially affect your approach, but +- 10 percent isn't that important.

I do see it more as a development tool, rather than an analysis tool for the data scientist, but if the performance gains are substantial enough, it could very well be means to pinpoint areas that merit more detailed analysis.

Much like wood worker can use both the 16-tooth circular saw and the Japanese pull saw, every tool has its place.

Just because you're right doesn't mean everybody else is wrong.

Jeff Moden SSC Guru Points: 1004149 More actions · Answer 5

Ok... now that I agree with. Thanks for your time, Rune.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ricky Lively Right there with Babe Points: 730 More actions · Answer 6

Pinal Dave found it was not very efficient... as always... it depends... test in your environment to be sure it works like you want...

https://blog.sqlauthority.com/2019/12/11/sql-server-approx_count_distinct-not-always-efficient/

Stewart "Arturius" Campbell SSC Guru Points: 72593 More actions · Answer 7

Agree with Rune. nice new tool for initial profiling, but not much use to Quants and the like.

____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
“libera tute vulgaris ex”