??Bug in ROW_NUMBER() + Partition BY in CTE

Question

Post reply

??Bug in ROW_NUMBER() + Partition BY in CTE

Ninja's_RGR'us

SSC Guru

Points: 294069
More actions
November 10, 2011 at 7:37 am

#262640

Still working on a update script to save & backsave historical price data.
I can't find any bugs regarding this on connect. Did you guys read anything about that issue?
Microsoft SQL Server 2005 - 9.00.3042.00 (Intel X86) Feb 9 2007 22:47:07 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 2)
I haven't been able to do a repro script yet so bare with me.
Here are the screenshots with query, results and execution plans attached.
My problem is that is seems that the windowing function is not calculated at the same step in the procedure in the bug version, resetting the counters when they souldn't! Now I know how to get around this issue, but I'm wondering if I need to pursue this and post a bug on connect or ignore the issue because it's been fixed already.
Exected results :
Bad results when changing the where clause (old reflex, makes almost no sense whatsoever for this case)

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply

Brandie Tarvin SSC Guru Points: 173119 More actions · Answer 1

Remi,

Silly question. Why are you turning an OUTER JOIN into an INNER JOIN by adding C2.NoArticle to your WHERE clause?

Would that have anything to do with your issue?

EDIT: I'm not sure what to put as the second argument here, but you could try:

ISNULL(C2.NoArticle,C1.NoArticle) = '10A-057-0'

See how that works.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 2

Brandie Tarvin (11/10/2011)
Remi,
Silly question. Why are you turning an OUTER JOIN into an INNER JOIN by adding C2.NoArticle to your WHERE clause?
Would that have anything to do with your issue?

I was playing with a few things in that script to see what I really needed to do to cover all the bases.

My reflex is to put the filter on a few if not all relevant columns to help the optimiser. However in this case it seems to have backfired into a wrong simplification and incorrect windowing . This is where I found the incorrect results.

I'm fully aware that this is a super edge case, but a bug's a bug!

In this case I wanted to work the query on only 1 SKU & 10 rows while I was locking the logic down.

The real table I'm working on has 208K rows & 12K SKUS.

I'm trying to repro with a table with only 10 rows and no luck so far and I can't waste time on that for now... should be able to tomorrow.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

Brandie Tarvin (11/10/2011)
Remi,
EDIT: I'm not sure what to put as the second argument here, but you could try:
ISNULL(C2.NoArticle,C1.NoArticle) = '10A-057-0'

Not what I need in the final query but for the fun of documenting, it returns the correct results.

Brandie Tarvin SSC Guru Points: 173119 More actions · Answer 4

When you have time...

What exactly is the data supposed to be? Product data? What do the dates mean for the data? (I'm looking for the English explanation here so I can repro something on my own).

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Brandie Tarvin (11/10/2011)
When you have time...
What exactly is the data supposed to be? Product data? What do the dates mean for the data? (I'm looking for the English explanation here so I can repro something on my own).

I save and daily snapshot of 10 different prices / costs for each sku.

I only save when there's a change. Hence the begin / end date pair. 9999-12-31 is the "null" end date.

What I'm doing now is backsaving historical data for 2 new columns. At line 300 and something in the code I've already inserted the data in the table. Everything is correct except the end date.

So in the CTE and build a row number to make the self join and boom!

Next step would be to do the update but I hit the snag before.

TIA if you want to try for a repro script.

GSquared SSC Guru Points: 260824 More actions · Answer 6

Try moving the Where clause inside the CTE.

What it looks like to me is that you're getting a Join Math issue, caused by the fact that the CTE is actually "run" twice.

It looks like the Where clause is being applied before the join math. Not uncommon. In a declarative language, I'd call it a bug.

As for it being fixed, try it in a later version of SQL, see if it does what you need. I don't have the underlying tables and data, so can't test it for you.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 7

GSquared (11/10/2011)
Try moving the Where clause inside the CTE..

That fixes it too.

GSquared (11/10/2011)
What it looks like to me is that you're getting a Join Math issue, caused by the fact that the CTE is actually "run" twice.
It looks like the Where clause is being applied before the join math. Not uncommon. In a declarative language, I'd call it a bug.

That's what I'm guessing from the plan. There operators are not in the same sequence. My guess is that since the where filter nullifies part of the join it messes up the logic or wrongly simplifies the plan. And if the row_number(partition by sku) is applied to NULL instead of the real SKU, then it explains the repeating 1 value. That makes sense in my head but I have a hard time proving that in the plan.

Can you spot anything wrong in the plan?

GSquared (11/10/2011)
As for it being fixed, try it in a later version of SQL, see if it does what you need. I don't have the underlying tables and data, so can't test it for you

This is our full costing / pricing data for profit margins so I can't just hand it out :hehe:.

That's why I said I'd have to spend some time on the repro script to share with anyone.

But since nobody's been able to find me a connect item or KB link, it looks like this is what I'll be doing tomorrow.

Brandie Tarvin SSC Guru Points: 173119 More actions · Answer 8

Would you actually be using a WHERE clause IRL?

Or does the WHERE clause exist only to limit your row values at the moment?

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 9

Brandie Tarvin (11/10/2011)
When you have time...
What exactly is the data supposed to be? Product data? What do the dates mean for the data? (I'm looking for the English explanation here so I can repro something on my own).

SKU, <20 columns of prices, costs, descriptions...I didn't show>

Begin date => start date where those prices are valid

End Date => last day where those prices are valid.

I didn't want to save 12K rows / day that's why I have the begin / end date pair. That way I save only 1100 items / days in that table. Hell of a difference over a couple years.

I have a meeting starting soon so I'll have time for the repro script later.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

Brandie Tarvin (11/10/2011)
Would you actually be using a WHERE clause IRL?
Or does the WHERE clause exist only to limit your row values at the moment?

No where clause in the final script. I wanted to prove my script on 1 sku, then validate the whole 200K dataset, then run the update. Then run in prod.

I am first looking at => Is this a known bug, google & connect havn't found anything useful for me so far.

If this is "new", I'll make a repro script so I can let you guys test and send to connect or css.

Paul White SSC Guru Points: 150468 More actions · Answer 11

It is a bug, and one I have seen before. In the bad plan, the top branch index scan is unordered (it has the Ordered:False attribute). The required sort is after the Sequence Project doing the row number. You are quite likely to encounter already-fixed bugs unless you upgrade to a supported version of SQL Server. Mainstream support for 2005 SP2 ended on 12 January 2010. Extended support is not available - you need to update to at least SP3 to get that. SP4 is the current Service Pack for 2005.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 12

SQL Kiwi (11/10/2011)
It is a bug, and one I have seen before. In the bad plan, the top branch index scan is unordered (it has the Ordered:False attribute). The required sort is after the Sequence Project doing the row number.

Right on the money. Exactly what's happening.

SQL Kiwi (11/10/2011)
You are quite likely to encounter already-fixed bugs unless you upgrade to a supported version of SQL Server. Mainstream support for 2005 SP2 ended on 12 January 2010. Extended support is not available - you need to update to at least SP3 to get that. SP4 is the current Service Pack for 2005.

We don't have the ressources to retest the whole ERP. It's the 3rd time I have a situation lile this. And certainly not the last!

Just to give you a ballpark, I just got my first desk after 2.5 years consulting here!

Thanks for the heads up Kiwi.

Paul White SSC Guru Points: 150468 More actions · Answer 13

Ninja's_RGR'us (11/10/2011)
We don't have the ressources to retest the whole ERP. It's the 3rd time I have a situation lile this. And certainly not the last!

OK (though the risk going from SP2 -> SP4 is pretty small, and some would assess staying on SP2 forever as a greater risk). So anyway, you'll just have to work around the bugs as you encounter them - perhaps use a temporary table to materialize the CTE here. Anyway, being on an unsupported version means there's no point submitting a Connect item.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 14

SQL Kiwi (11/10/2011)
Ninja's_RGR'us (11/10/2011)
We don't have the ressources to retest the whole ERP. It's the 3rd time I have a situation lile this. And certainly not the last!
OK (though the risk going from SP2 -> SP4 is pretty small, and some would assess staying on SP2 forever as a greater risk). So anyway, you'll just have to work around the bugs as you encounter them - perhaps use a temporary table to materialize the CTE here. Anyway, being on an unsupported version means there's no point submitting a Connect item.

... unless you prove them on supported version. Which I did on the corrupt db issue :-D.

We have an upgrade for something else on the prod server with full vm backups and the likes. Maybe I'll push for doing both.

Overall we're looking into upgrade the whole ERP & go to 2008 R2 next winter. Hence ressource planning on the current system is really minimal.