Statistics?

  • This is a long shot, but I'm asking here because someone will know the answer!

    I'm working on a quality control project for my company.  I expect that the error rate will be very low, about 1% of the total, and I'm trying to find some statistical methodology that gives me a good estimate of the number of samples I need and the error rate of the resulting calculation of the error rate. 

    When I look at standard methods, the margin of error for a reasonable sample size (50 out of 5000) is almost 5%.  This completely overwhelms my expected 1% error rate. 

    Is there a better calculation method?

    So long, and thanks for all the fish,

    Russell Shilling, MCDBA, MCSA 2K3, MCSE 2K3

  • Hi Russell - so what, exactly, are you looking for?

    1) Are you looking for a way of randomly selecting a sample % of records from an overall population?

    2) Or are you wondering how big your sample size needs to be to ensure that you catch x% (statistically) of errors? If so, what is 'x'?

    3) Something else?

    Regards

    Phil

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I've got a SQL stored procedure that randomly picks a given percentage of records.  I'm looking for both of the items in #2.  The population size is about 5,000, and the expected error rate is about 1%. 

    So long, and thanks for all the fish,

    Russell Shilling, MCDBA, MCSA 2K3, MCSE 2K3

  • The only reliable way to retrieve a random sample of records wit SQL Server is to use the Newid() function.  Do a search of this forum's archives for 'random' and you should get some good advice on the actual SQL to use to get a random sample.

    If your organisation has a statistician on staff, you should ask them about the relevant algorithms to use to get the type of sample you need to satisfy the business need.  It can be very easy for a developer or DBA to put some SQL together that gives a statistically meaningless result.  If the business relies on that result for decision making, it could end up making the wrong decision...

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply