Normalizing-Denormalized Tables

  • here is a script i wrote that helps normalise a single table into a number of tables. useful for normalising a database during the development phase.

    [/url]

  • here is a script i wrote that helps normalise a single table into a number of tables. useful for normalising a database during the development phase.

    [/url]

  • here is a script i wrote that helps normalise a single table into a number of tables. useful for normalising a database during the development phase.

    http://www.sqlservercentral.com/scripts/Normalisation/66370/

  • Well done,

    I had a similar challenge during data migration, my solution was by splitting character-delimited values with XQUERY and applying transformation across a whole table.

  • Ed Klein-385786 (11/12/2008)


    I'm just saying it would be nice. Do you have to go to that much effort to sum a numeric field? Why shouldn't accumulating text be as easy.

    It's far from impossible, but the main issue is that you can't quite predict results as you're used from working with numbers.

    Assume you'd be aggregating 3 records. The SUM would do 1 + 2 + 3 and the result would be 6. It doesn't really matter if internally it does 1 + 2 + 3 or 1 + 3 + 2 or 3 + 2 + 1 or etc...

    However, if you're accumulating text, then it does become relevant in what order the records get processed. A + B + C = ABC is a different result from A + C + B = ACB, or C + B + A = CBA etc...

    Given the way SQLCLR aggregations are coded it gets difficult quickly on how you want to implement this in order to get a consistent result; for things like COUNT(), SUM(), AVG(), MIN(), MAX() order has no influence and you can simply 'stream' through the input updating some working variables that then get returned in the end (after some minor processing). The amount of resources needed for this is rather low. But, if you want to have consistent results for the ACCTEXT() by e.g. sorting the values first, this means you have to bring them ALL in memory first, sort them along the way, then copy everything into one giant string and then return that again. And as if that's not bad enough already, if the Query Optimizer would decide to split the work over multiple threads this also includes merging the (presorted) data in the final step, sort it again and then do the conversion to one big string.

    It's not un-doable, in fact I bet that if you google for it you're bound to find some implementations that are likely more worked out than what I just typed out of my head above. But from a "what's going on behind the scenes" point of view things could get ugly really fast I think.

  • Plus, now you've messed with the vendor's code. Say they push an update expecting to modify the table, but now it is a view...and the update fails. you've just fallen to the bottom if their support queue.

  • Am I the only one bothered by the lack of unique constraint/index on the natural keys in in the new tables.

    The new design allows the insertion of duplicate entries for GroupName and ResourceName.

  • You can get rid of that ugly function with cursor. You can implement the same thing in the view itself. You are slowing down the performance even more though you are trying to normalize.

  • I was coming here to post the same thing as the comment above, there is no need to use cursors for this , the performance hit from using them is too much.

    a stuff function, using as parameter the select statement that returns xml would be much faster.

  • arnipetursson (11/1/2013)


    Am I the only one bothered by the lack of unique constraint/index on the natural keys in in the new tables.

    The new design allows the insertion of duplicate entries for GroupName and ResourceName.

    No, you're not the only one (see here)

  • Normalization doesn't always help for the performance.

  • cs_troyk (11/1/2013)


    arnipetursson (11/1/2013)


    Am I the only one bothered by the lack of unique constraint/index on the natural keys in in the new tables.

    The new design allows the insertion of duplicate entries for GroupName and ResourceName.

    No, you're not the only one (see here)

    Missed that, thanks.

    I see this mistake (i.e. missing uniqueness constraints on natural key) very often.

  • deroby (11/1/2013)


    Ed Klein-385786 (11/12/2008)


    I'm just saying it would be nice. Do you have to go to that much effort to sum a numeric field? Why shouldn't accumulating text be as easy.

    It's far from impossible, but the main issue is that you can't quite predict results as you're used from working with numbers.

    Assume you'd be aggregating 3 records. The SUM would do 1 + 2 + 3 and the result would be 6. It doesn't really matter if internally it does 1 + 2 + 3 or 1 + 3 + 2 or 3 + 2 + 1 or etc...

    However, if you're accumulating text, then it does become relevant in what order the records get processed. A + B + C = ABC is a different result from A + C + B = ACB, or C + B + A = CBA etc...

    Given the way SQLCLR aggregations are coded it gets difficult quickly on how you want to implement this in order to get a consistent result; for things like COUNT(), SUM(), AVG(), MIN(), MAX() order has no influence and you can simply 'stream' through the input updating some working variables that then get returned in the end (after some minor processing).

    You are making a big mistake if you think that the existing numeric aggregates deliver results which are independent of evaluation order. because they don't.

    Pretending that these aggregates are order-independent ensures that SUM and AVG and STDEV are indeterminate for for all numerics except tinyint, both for exact numerics other than tinyint (smallint, int, bigint, numeric, decimal, money) and approximate numerics (real, float). With all of these, whether overflow or occurs can depend on the order of evaluation. With all the non-integer types, the accuracy of the result also depends on the order of evaluation (both exact numerics and approximate numerics tend be have very poor accuracy if aggregation is done in the wrong order). For the approximate numerics whether loss of significance (underflow) happens also depends on order of evaluation.

    That's why it's rather common when working with very large quantities of numeric data to avoid use of SQL aggregates other than COUNT, and instead pull the data out of the database and do the aggregation in the app. This may seem inefficient when hundreds of millions of rows are involved, since using the aggregates delivers a result much faster; but that result may well be so inaccurate as to be completely useless.

    We need proper reduction operators in SQL instead of these somewhat useless aggregates. Perhaps something like

    REDUCE <operation> ON <rowset expression> OVER (<order>) BASE <base-value>;

    That would work fine for the text case, as well as making the numeric aggregates useable (if one makes sure of computing the columns in <columnlist> so as to deliver the right evaluation order). It would of course be nicer not to be limited to scalar aggregates, really nice to be able to go for the full generality of the REDUCE operator in functional languages, but that's probably a step too far for SQL.

    Tom

  • L' Eomot Inversé (11/3/2013)


    Pretending that these aggregates are order-independent ensures that SUM and AVG and STDEV are indeterminate for for all numerics except tinyint, both for exact numerics other than tinyint (smallint, int, bigint, numeric, decimal, money) and approximate numerics (real, float). With all of these, whether overflow or occurs can depend on the order of evaluation. With all the non-integer types, the accuracy of the result also depends on the order of evaluation (both exact numerics and approximate numerics tend be have very poor accuracy if aggregation is done in the wrong order). For the approximate numerics whether loss of significance (underflow) happens also depends on order of evaluation.

    Intriguing! I can understand how overflow could easily occur, but do you have examples of how precision of exact numerics (particularly decimal) can be affected by order of aggregation?

  • L' Eomot Inversé (11/3/2013)


    You are making a big mistake if you think that the existing numeric aggregates deliver results which are independent of evaluation order. because they don't.

    Hmm.. Although I agree with what you're saying, the effect will be MUCH more pronounced when accumulating text.

    I'm not saying that the current aggregations are perfect but for a lot of applications they'll do just fine.

    From experience I'd say we've run into a lot more of issues with loss of detail when multiplying/dividing numbers =(

Viewing 15 posts - 46 through 60 (of 62 total)

You must be logged in to reply to this topic. Login to reply