RE: Which Algorithm? – SQLServerCentral

SSC Guru

Points: 398690

December 16, 2015 at 10:30 am

mattmparker (12/16/2015)
So, for example, a region and a salesperson are likely to be related, not a unique relationship, in each sale.
I think you're on the right track, and just getting stuck on the terminology. I think if any of these questions is the question you're asking, a chi-squared test of independence is appropriate:
- Is there a relationship between region and salesperson?
- Given the name of the salesperson, could I make a better-than-chance guess at the region in which a sale happened? (and vice versa)
- Does the salesperson-region relationship in the data reflect the database's previous assumption (uniform distribution) or its new assumption (some kind of non-uniformity)?
I'm way out of my depth on all of this, so if any of this doesn't make sense, don't hesitate to let me know.

Yeah, I'm looking to see if there is a relationship between region and salesperson, so it sounds like the chi-square test of independence is the way to go. I'm working on building a new data set for my tests. I'll post results and code back here when I get them in hand.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning