• mattmparker (12/16/2015)


    After reading up on cardinality, another couple of questions occurred to me:

    - Does the density/distribution of salesperson-region relationships have high cardinality (salespeople strongly associated with one or two regions), or low (salespeople working across many regions)?

    - Does the density of those relationships better match the old optimizer assumption, or the new one (exponential)?

    In that case, an alternative might be to plot three densities: your empirical density, the previously-assumed density (uniform?), and the new density (exponential?). There are a few examples here, but tons more on the web: https://heuristically.wordpress.com/2012/06/13/comparing-continuous-distributions-with-r/

    After that, there are a whole slew of goodness-of-fit tests for checking whether your data fit a given distribution or not; off the top of my head, you could probably start fitdistr() from the MASS package.

    No clue how you'd go about generating all of those densities, though!

    Excellent. Thanks. I'll explore this as well. Sorry the question mixed technologies so much. I'll see what I can do with that and report back.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning