L' Eomot Inversé (10/31/2012)
This probably a pointless argumnent, because it isn't about anything technical, just about historical and terminological accuracy, and it does appear that quite a few people have adopted historically inaccurate terminology (which is a pretty frequent phenomenon - for example we call Heath's normal form, published in 1971, "Boyce-Codd Normal Form" after a paper published several years later, although the earlier paper was certainly known to the senior of the authors of the later paper). But I'm going to make the argument for historical accuracy (especially since several of the people involved are still alive) anyway (if I were in touch with any of them I would ask them for their views, but unfortunately I'm not).
Well, perhaps there's one thing we can agree on.
Without getting word "from the horses mouth", it's impossible to know for certain what the IBM team considers to be in the domain of the Halloween problem.
All your transcript link provides is a recollection of the circumstances in which it was discovered.
There are a number of unanswered possibilities:
- When the problem was discovered, did the team consider INSERT / DELETE scenarios at all?
- Even if they hadn't originally considered the possibility of INSERT / DELETE exhibiting similar unexpected behaviour for the same reason, would they consider those to be a different problem?
- Did the team consider the Halloween problem as exclusively applicable to the UPDATE statement?
- Or perhaps even more strictly, did they consider it to be the specific problem of updating Salaries over the Salaries index?
- When they use the word updates, do they mean changing existing records (UPDATE statement), or any update that changes the state of the database?
I will however, give you something to consider.
Languages, words and names are in continuous evolution to aid communication. Believe me, you're not doing yourself any favours if you reject a logical definition on an historical 'point of order' - regardless of the accuracy of your claim.
Nevertheless, I'm willing to agree to disagree.
L' Eomot Inversé (10/31/2012)
craig 81366 (10/31/2012)
Then why when I run the following by itself, does the plan not spool?
SELECT AVG(COL) FROM TABLE_NAME
Because a displayed MS SQL query plan only shows spool operations when they are not blatantly obvious (which I think is eminently sensible); no other reason.
Whereas I can agree to disagree on the previous point, here I must step in and call a spade a spade.
I can only assume your insistence on pursuing this is that you don't fully understand what a spool operation does. It effectively reads all relevant data into temporary storage before proceeding to the next step. This isolates the data from changes that could be effected in said step. It can also be a relatively expensive operation, and so is not nearly as trivial as you seem to think.
Not pulling punches: I consider your claim to be utterly unsubstantiated, unfounded rubbish!
If "blatantly obvious" operations can simply be ignored, where do you draw the line? Are there some hash joins, or nested lookups that are "blatantly obvious"?
The fact is you do not need to spool (first read all data into temporary storage) to calculate an average.
I apologise if you consider the elaborate explanation (below) of how to calculate an average to be beneath you, but please observe that you can perform the calculation as you read data. You do not need to spool it first.
- Read first record (Val=10)
- Increment record counter(1), Add Val to cumulative total (0+10=10)
- Read next record (Val=20)
- Record Counter=2, Cumulative Total=30
- Read next record (Val=30)
- Record Counter=3, Cumulative Total=60
- Whenever you're done reading records the average is simply Total/Counter
If you think you can deliver a result for a query for AVG without first reading all the records, then think again, because you are horribly wrong.
Your sarcasm is not amusing. Perhaps I should point out that in the above example: the AVG after reading the first record is 10, the second 15, and after reading all the data 20.
Just to be clear, there is a big difference between "calculating the AVG after reading all the data", and "calculating the average as you read the data".
* Do you need to read all the data before you can get the average of all the data - of course!
* Do you need to read all the data before you start doing any calculations - not at all!
You can't deliver the first value of AVG until you have done that. Then, why read the records again?
I will concede that the most practical implementation of the query to DELETE all records less than the average is to precalculate the average, and treat it as a constant input (i.e. spool a single value).
Which is exactly why I elected to provide a second DELETE example in which calculating the decision to DELETE each row is independent of the same calculation for other rows.
Sure, you could track changes in AVG as the table changed, if you wanted to and had an awful lot of compute power to spare,
Not at all, maintaining the average as you delete rows is a very cheap operation when done correctly.
If you have a Cumulative Total and Record Counter (as in the earlier example), then each time you delete a row: decrease the Counter and subtract the Value from the Cumulative Total. The average at any point in time can be delivered probably more cheaply than the IO ops need to actually perform the delete.
I've found your contributions to this topic to be informative and interesting.
However, I find your flippancy and stubbornness quite annoying. If you're not going to be open-minded enough to actually put thought into what I have to say, I can't be bothered to waste any more time explaining it to you.
I do enjoy a good debate and appreciate logical argument, but presenting unfounded rubbish as fact gets up my nose as much as historical inaccuracies seem to get up yours.