September 25, 2010 at 6:07 am
Paul White NZ (9/25/2010)
Paul White NZ (9/7/2010)
There is an optimizer quirk that means that some plans using INDEX(0) may include an unnecessary sort (which does not apply to INDEX(1)). I will be blogging the details for next week's T-SQL Tuesday (the topic is indexes).Yeah well I missed T-SQL Tuesday by some margin, but it's up now:
http://sqlblog.com/blogs/paul_white/archive/2010/09/23/a-tale-of-two-index-hints.aspx
Nice. I learned new things from this. Write more like this one, please.
But I noticed that query 1 and query 2 swapped names at the beginning of the index hints section. Was that deliberate to keep the readers alert?
Tom
September 25, 2010 at 6:58 am
Tom.Thomson (9/25/2010)
Nice. I learned new things from this. Write more like this one, please.
Thank you, I appreciate it. More in the same style or area of the product?
But I noticed that query 1 and query 2 swapped names at the beginning of the index hints section. Was that deliberate to keep the readers alert?
I make that error so frequently I don't even bother to check for it now :rolleyes:
You're the first one of 1,745 reads to spot it though - so well done! I've fixed it now, by the way.
October 9, 2010 at 6:38 pm
mjswart (9/22/2010)
Hi Jeff,Records aren't always stored physically ordered on a data page. Most of the time they are physically ordered correctly, but the actual order comes from the slot array. I was curious to see whether a different physical order breaks the quirky update.
Turns out it does not break the quirky update:
-- create the table
USE tempdb
GO
CREATE TABLE test
(
id int not null,
value uniqueidentifier,
deltaFromPrevious int,
constraint pk_test primary key clustered (id)
)
INSERT test (id,value)
VALUES
(2,NEWID()),
(4,NEWID()),
(6,NEWID()),
(8,NEWID()),
(10,NEWID())
GO
-- run quirky update
DECLARE @dfp int, @prev int;
UPDATE test
SET @dfp = deltaFromPrevious = id-@prev,
@prev = id;
-- everything looks fine here.
select * from test
-- mess up the physical order:
UPDATE test SET id = 5 WHERE id = 10;
GO
-- dbcc page now tells us that physically the
-- rows are out of order.
-- run the quirky update again:
DECLARE @dfp int, @prev int;
UPDATE test
SET @dfp = deltaFromPrevious = id-@prev,
@prev = id;
-- everything still looks fine here:
select * from test
DBCC may indeed show the data out of order but not in your code. You forgot to include the DBCC command examples in your code. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
October 10, 2010 at 12:57 pm
Paul, Jeff, et all...
I want to ensure that I understand the safety check as posted by Paul at http://www.sqlservercentral.com/Forums/FindPost980118.aspx, and modified by Tom at http://www.sqlservercentral.com/Forums/FindPost981258.aspx.
The resultant (combined) code would be:WITH SafeTable
AS (
SELECT Sequence = ROW_NUMBER() OVER (ORDER BY TD.AccountID ASC, TD.Date ASC, TD.TransactionDetailID ASC),
TD.AccountID,
TD.AccountRunningCount,
TD.AccountRunningTotal,
TD.Amount
-- removed 3 columns from CTE not used in update statement
FROM dbo.TransactionDetail TD
)
UPDATE SafeTable
SET @AccountRunningTotal = AccountRunningTotal =
CASE
WHEN Sequence = @Sequence+1
THEN CASE
WHEN AccountID = @PrevAccountID
THEN @AccountRunningTotal+Amount
ELSE Amount
END
ELSE 1/0
END,
@AccountRunningCount = AccountRunningCount =
CASE
WHEN AccountID = @PrevAccountID
THEN @AccountRunningCount + 1
ELSE 1
END,
@Sequence =
CASE
WHEN Sequence = @Sequence + 1 THEN Sequence -- updating in sequence
ELSE 1/0 -- quirky update is broken!
END,
@PrevAccountID = AccountID;
And, even though the ROW_NUMBER() function does include a sort, thus presenting the data to the update statement in the proper order, we STILL need the clustered index, on the same columns and in the same order as utilized in the ROW_NUMBER() function.
Has any testing been done to show that the OPTION (MAXDOP 1), and the WITH (TABLOCKX) hints are now not necessary? Obviously, the safety check will catch that if anything is off, but are they still needed to ensure that the update succeeds?
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 10, 2010 at 5:11 pm
I believe that the MAXDOP 1 option and TABLOCKX hint are still required.
Unless parallel processing jumped in at just the right spot, the serial processing of a single CPU is still necessary to do the calculation correctly.
For the TABLOCKX hint, why wouldn't you use it? The UPDATE is going to eventually lock the whole table... why not just take a single lock to begin with and enjoy the guarantee that no one is going to add data in the middle of your calculation?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 10, 2010 at 8:51 pm
WayneS (10/10/2010)
And, even though the ROW_NUMBER() function does include a sort, thus presenting the data to the update statement in the proper order...
It's the order of the rows arriving at the Compute Scalar that is important, not at the Update iterator itself.
WayneS (10/10/2010)
...we STILL need the clustered index, on the same columns and in the same order as utilized in the ROW_NUMBER() function.
Almost always yes. On anything more than a very few rows, the optimizer will choose to introduce a sort on the clustered index keys. This sort is almost guaranteed to occur before the critical Compute Scalar, thus buggering up the essential ordering at that point.
WayneS (10/10/2010)
Has any testing been done to show that the OPTION (MAXDOP 1), and the WITH (TABLOCKX) hints are now not necessary? Obviously, the safety check will catch that if anything is off, but are they still needed to ensure that the update succeeds?
It's all but impossible (right now) to generate a parallel plan for a quirky update, but it seems logical that the work performed by the Compute Scalar has to occur on a single thread for correct results. So, I vote for MAXDOP 1 being essential.
TABLOCKX isn't essential in quite the same way, but it seems like a sensible precaution to me, for the reasons Jeff gave.
October 10, 2010 at 9:15 pm
Thanks for your responses guys. It's right in line with what I was thinking, but it's always good to get the double-check.
Now, I just gotta get used to using the QU with the PW/TT safety check modification.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 11, 2010 at 5:16 am
Paul White NZ (10/10/2010)
TABLOCKX isn't essential in quite the same way, but it seems like a sensible precaution to me, for the reasons Jeff gave.
I think there's another reason as well: if this query is the only one in a transaction, using TABLOCKX will reduce the risk of this query deadlocking with something else that's going on.
Tom
November 17, 2010 at 6:14 pm
Hello All,
I have been playing with the QU code in Jeff's article and discovered a couple of things....
It seems you CAN update in any order you like regardless of the clustered index ! :pinch:
You may not therefore NEED a clustered index.. My testing so far shows that I can perform a correct update regardless of having a CI or not...but it is very limited testing so far....
The key to this is the use of ordering in a CTE - which I know Jeff covered but he missed the clincher I believe. The TOP clause has the answer.
It makes sense, does it not, that if I select 10 rows in my CTE ordered by anything I like, and then update that CTE I will update those 10 rows? Yes?
Yes! So extending that to the whole table is as simple as counting the rows in the table first, then specifying that count in the TOP clause..
So, you will be dying to try it... Here is the code - taken from Jeff's article and amended:
EXEC dbo.ResetTestTable
DECLARE @PrevAccountID INT
DECLARE @AccountRunningTotal MONEY
DECLARE @AccountRunningCount INT
--===== Update the running total and running count for this row using the "Quirky
-- Update" and a "Pseudo-cursor". The order of the UPDATE is controlled by the
-- order by clause in the CTE!
--===== Get the rowcount - this is vital - it must not exceed the actual rowcount.
DECLARE @count bigint = (select COUNT(*) from TransactionDetail)
--===== Update the running total using multipart updates
-- applied to an "ordered" CTE.
;WITH cteOrdered AS
(
SELECT TOP (@count) --===== This forces the new ordering.
AccountID ,
NCID,
Amount,
AccountRunningTotal,
AccountRunningCount
FROM dbo.TransactionDetail
ORDER BY NCID DESC --Don't forget... reverse order here
)
UPDATE cteOrdered
SET @AccountRunningTotal = AccountRunningTotal = CASE
WHEN AccountID = @PrevAccountID
THEN @AccountRunningTotal + Amount
ELSE Amount
END,
@AccountRunningCount = AccountRunningCount = CASE
WHEN AccountID = @PrevAccountID
THEN @AccountRunningCount + 1
ELSE 1
END,
@PrevAccountID = AccountID
FROM cteOrdered WITH (TABLOCKX)
GO
You will want this new Verify SP as well - VerifyNCID - to check the running totals. Again this is taken from Jeff's code and amended to check the results ordered by NCID.
CREATE PROCEDURE dbo.VerifyNCID AS
/*************************************************************************************
Code to verify that the account running total calculation worked correctly.
Please read the comments to see how it works.
*************************************************************************************/
--===== Conditionally drop the verification table to make
-- it easy to rerun the verification code
IF OBJECT_ID('TempDB..#Verification') IS NOT NULL
DROP TABLE dbo.#Verification
--===== Define a variable to remember the number of rows
-- copied to the verification table
DECLARE @MyCount INT
--===== Copy the data from the test table into the
-- verification table in the correct order.
-- Remember the correct order with a ROW_NUMBER - otherwise we don't use the correct ordering
SELECT ROW_NUMBER() OVER(ORDER BY NCID DESC) AS RowNum,
AccountID,
Amount,
AccountRunningTotal
INTO #Verification
FROM dbo.TransactionDetail
ORDER BY NCID DESC
--===== Remember the number of rows we just copied
SELECT @MyCount = @@ROWCOUNT
--===== Check the running total calculations
SELECT CASE
WHEN COUNT(hi.RowNum) + 1 = @MyCount
THEN 'Account Running Total Calculations are correct'
ELSE 'There are some errors in the Account Running Totals'
END
FROM #Verification lo
INNER JOIN
#Verification hi
ON lo.RowNum + 1 = hi.RowNum
WHERE (-- Compare lines with the same AccountID
hi.AccountID = lo.AccountID
AND hi.AccountRunningTotal = lo.AccountRunningTotal + hi.Amount)
OR
(-- First line of account has running total same as amount
hi.AccountID <> lo.AccountID
AND hi.AccountRunningTotal = hi.Amount)
GO
Now, the question is - does this make anyone happier about using it as you can define the sort order specifically?
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
November 17, 2010 at 7:21 pm
MM -
Looks interesting. It will take more investigation.
I have some initial questions:
--===== Get the rowcount - this is vital - it must not exceed the actual rowcount.
1. So, what happens if a record (or more) gets deleted between the start of this statement and the next where the variable is used?
2. Wouldn't it be faster to get the record count from sys.dm_db_partition_stats?
3. Is there any special reason that you didn't include the safety check?
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
November 17, 2010 at 7:39 pm
mister.magoo (11/17/2010)
Now, the question is - does this make anyone happier about using it as you can define the sort order specifically?
No! Let me explain why:
The TOP is the same idea, in principle, as that used by the safety check. I mentioned in previous posts that the safety check allows a quirky update to be performed without a clustered index - specifically on a heap table. If there is a clustered index, in a different order from that specified either by your TOP clause, or by my ROW_NUMBER safety check, the quirky update will usually fail.
The reason is that the optimizer will introduce a sort to optimize the clustered index update for sequential IO, and that sort appears before the vital Compute Scalar, messing up the quirky update.
I prefer the ordering provided by the safety check over your TOP method because:
(a) The safety check ensures the QU completes successfully or throws an error;
(b) No extra COUNT_BIG(*) step is required
Both the safety check ROW_NUMBER() clause and the TOP...ORDER BY method introduce a sorting requirement. The important thing (for the quirky update to work) is that this order is preserved later in the plan when the Compute Scalar processes the running total.
In general, there is no guarantee at all that this sort order will be preserved for later iterators. That's why we need the safety check. To be clear, the engine does not guarantee that sort order will be preserved any longer than is required for correctness. For example, a sort performed to provide order to a TOP iterator, is guaranteed to be retained as far as the TOP, but no further. The next iterator after the TOP might have rows presented to it in any order, in principle.
This may seem counter-intuitive, since you can look at a plan and 'see' that rows are sorted at one point, and never re-sorted. You might even reason that there aren't any non-order-preserving iterators (like a hash) and so order must be preserved. This is the faulty assumption I want people to stop making.
The advantage of the safety check is that it will throw an error if the assumption about preserved sort order does not hold.
Paul
November 17, 2010 at 7:49 pm
WayneS (11/17/2010)
I have some initial questions:--===== Get the rowcount - this is vital - it must not exceed the actual rowcount.
This seems wrong to me - surely the row count can safely exceed the actual, but must not be less. I usually use TOP (9223372036854775807) for this sort of 'intermediate materialization' query. In TOP (expression), expression is BIGINT (when PERCENT is not specified), and 9223372036854775807 is the maximum value of a BIGINT
So, what happens if a record (or more) gets deleted between the start of this statement and the next where the variable is used?
It would be problematic if new records were added. Another reason to use TOP (9223372036854775807).
Wouldn't it be faster to get the record count from sys.dm_db_partition_stats?
If the exact number were required for the TOP (which it isn't) you couldn't rely on that view to provide it - it's not guaranteed to be transactionally correct, or even close.
Is there any special reason that you didn't include the safety check?
The safety check replaces the TOP idea completely.
November 18, 2010 at 1:06 am
All interesting points, which I also will have to investigate.
The only reason for not including Paul's safety check was that it was not in the Article which is where I took the code from.
As far as guaranteeing sort order is concerned, obviously the safety check is an excellent idea.
I have to admit I had not thought about it as it was not in the article - the fact it allows updates by any order is excellent and I bow to this as a much better solution.
However, I still am interested in the fact (so far as I can see) that making sure the TOP number does not exceed the rowcount means that the update does not use the clustered index for ordering and does not even seem to require one on the table....
Thanks for the feedback.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
November 18, 2010 at 1:38 am
Hi Mister Magoo,
In addition to the points raised by Paul, here is one more.
Older versions of SQL Server (prior to SQL Server 2005) allowed you to create "ordered views". This is in fact nonsense, as views and tables are sets which are unsorted by definition - this is why the standard SQL syntax does not even allow an ORDER BY in a view definition. And neither does SQL Server - unless the view definition contains a TOP clause, in which case an accompanying ORDER BY clause is allowed; this ORDER BY clause is then used to define which rows qualify the TOP clause.
People quickly observed that a SELECT from that VIEW always returned rows in the specified order. This was never documented or guaranteed; it was a coincidental side effect. And yet, people came in the habit of creating views with a TOP 100 PERCENT clause and an ORDER BY. expecting a SELECT from that view to guarantee results in the "correct" order.
In SQL Server 2005, the optimizer got smarter. It realized that TOP (100 PERCENT) is a no-op, and simply disregarded both the TOP operator and the accompanying ORDER BY. And people started complaining (and even reporting bugs) because their "ordered views" suddenly stopped working. Most people then changed their code, moving the ORDER BY to the SELECT that returns dat to the client. But there were also the stubborn people, who found out thet the undocumented behaviour persists if you use TOP (99.999999 PERCENT), or TOP (2147483647) [that is the maximum integer - I have not seen Paul's suggestion of TOP (9223372036854775807) before, but that is of course the same idea].
These suggestion do indeed seem to work - for now. And only if the amount of rows that qualify the WHERE clause does not exceed the specified number, or the number where 0.0000001% is 1 or more.
But is this safe? Should we not expect that the SQL Server developers, who constantly try to improve the optimizer, will one day replace the current "simple" logic to remove a TOP (100 PERCENT) with a more advanced logic that, at run time, removes any TOP clause that does not remove any rows? It may never happen. Or it may. It's definitely not a dependency I am willing to take for my code.
I have already written before in thsis discussion that Paul's safety check is a good improvement. It does not really make the method any more reliable (as it still relies on undocumented and unguaranteed functionality), but it does at least promise to throw an error when incorrect results would be generated without having to do a seperate check (which, I am sure, many people simply won't do after some testing time in a dev environment).
Whether or not the combination of your TOP trick with Paul's safety check makes this method more reliable is something I'll leave for others to judge. My point merely is that without Paul's safety check, this method (with or without TOP) is unsafe and therefor not suited for production code.
November 18, 2010 at 2:17 am
All,
Thanks for taking the time to explain things to me - this is certainly a tricky subject and I appreciate your help.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
Viewing 15 posts - 211 through 225 (of 308 total)
You must be logged in to reply to this topic. Login to reply