|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Thursday, November 17, 2011 4:09 PM
Points: 9,359,
Visits: 8,864
|
|
Garadin (1/25/2009) OK, I'll bite. What the heck is goin on there? I've looked at it a few times, and it's not making any more sense than the first time I read it. Tried to run it and see, but get errors creating the sample data table. I'm sure it's quite the trick, and definitely not one I've seen before.
(Msg 102, Level 15, State 1, Line 7 Incorrect syntax near 'Num'. Msg 156, Level 15, State 1, Line 16 Incorrect syntax near the keyword 'Group'.) Arrgh! My bad, sorry. I've been writing my queries on SQL 2008 lately to get more used to it and I forgot to test it on SQL 2005. Hang on, I'll get a corrected version...
-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung Proactive Performance Solutions, Inc. "Performance is our middle name."
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Thursday, November 17, 2011 4:09 PM
Points: 9,359,
Visits: 8,864
|
|
Actually, I think that it was just a typo from a left over diagnostic column.
Anyway, here is a slightly reqwritten version of the Load script:
Drop Table TestPCursors go CREATE TABLE TestPCursors (OrderNo INT NOT NULL, SeqNo INT NOT NULL, Type CHAR(2)) ALTER TABLE TestPCursors ADD PRIMARY KEY CLUSTERED (OrderNo, SeqNo) go
;With cteNumbers as (Select TOP 300 ROW_NUMBER() Over(Order by object_id) as Num From master.sys.system_columns) , cteCol300 as (Select Object_id , Num , column_id , name , Num+object_id as OrderNo , Num+object_id+column_id as ShiftStart1 , object_id+column_id as ShiftStart2 From master.sys.system_columns Cross Join cteNumbers) INSERT into TestPCursors Select OrderNo , column_id , MAX(Case When column_id=1 Then Left(name,2) When ShiftStart2%5=0 Then Left(name,2) When (ShiftStart1%7)=0 Then Left(name,2) Else '' End) From cteCol300 Group By OrderNo, column_id
-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung Proactive Performance Solutions, Inc. "Performance is our middle name."
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Thursday, November 17, 2011 4:09 PM
Points: 9,359,
Visits: 8,864
|
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Thursday, November 17, 2011 4:09 PM
Points: 9,359,
Visits: 8,864
|
|
Here is the vastly simpler non-optimized version:
DECLARE @PrevType CHAR(2) Declare @buffer varchar(MAX) Select @buffer = ''
--====== Load the buffer with all of the output values Select TOP 10000--00 @buffer = @buffer + Cast(OrderNo as CHAR(11)) + Cast(SeqNo as CHAR(3)) + Cast(Case When [Type]=' ' Then @PrevType Else [Type] End as CHAR(2)) , @PrevType = CASE When [Type]=' ' Then @PrevType Else [Type] End From (Select *, (ROW_NUMBER() Over(Order By OrderNo, SeqNo)) as RowNum From TestPCursors) A Order By OrderNo, SeqNo
--====== Now extract and display the output values for every row ;With cteNumbers as (Select TOP 10000--00 ROW_NUMBER() Over(Order By c1.object_id) as RowNum From master.sys.system_columns c1, master.sys.system_columns c2) Select Cast(SUBSTRING(@buffer, (RowNum-1)*16+1, 11) as int) as OrderNo , Cast(SUBSTRING(@buffer, (RowNum-1)*16+12, 3) as int) as SeqNo , Cast(SUBSTRING(@buffer, (RowNum-1)*16+15, 2) as CHAR(2)) as [Type] From cteNumbers
Note that because it is so slow, i have restricted it to just 10,000 rows.
Logically, this is the same algorithm as the optimized version, so hopefully you can see where this one is going...
-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung Proactive Performance Solutions, Inc. "Performance is our middle name."
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Thursday, November 17, 2011 4:09 PM
Points: 9,359,
Visits: 8,864
|
|
OK, how it works. First the unoptimized (simple) version:
The idea here is, just like an Update pseudocursor, use a Select pseudocursor to accumulate state across rows, to determine the handling of the output columns within each row. Two important differences, however. First, because SELECT cannot write to bot variables and output columns, I use a single string variable (@buffer) to hold all of my output columns and rows. The second difference is that because I am using ORDER BY on the outer select to control my order, this technique is completely supported.
Now, once this massive string (apprx. 16 MB) is created, then I use a Numbers/Tally table to enumerate all of the output rows that I want, and then extract the columns values from the giant string buffer, convert them to the correct data type and output them.
So, why is it so slow? Because simple/naive string accumulation is essentially a Triangular operation. Consider this:
row 001, add 'A': @b = '' + 'A'; cost of this row is 1 row 002, add 'B': @b = 'A' + 'B'; cost of this row is 2 row 003, add 'C': @b = 'AB' + 'C'; cost of this row is 3 row 004, add 'D': @b = 'ABC' + 'D'; cost of this row is 4 ...
and the cost of the millionth rows is 1,000,000. Add up the cost of every row to get the total cost and you get 5,000,005,000,000, which is a very big number indeed (five trillion, five million). And this was the motivation for the much more complicated stuff that you see in the optimized version.
-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung Proactive Performance Solutions, Inc. "Performance is our middle name."
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Thursday, November 17, 2011 4:09 PM
Points: 9,359,
Visits: 8,864
|
|
The second performance problem that the unoptimized version has is that the second select is much slower than I thought it would be. Because I know that the conversion from character string to INT is a high overhead operation, I figured to circumvent this problem by using binary strings instead of character strings.
Converting to or from binary is really just a memcopy operation with no conditions or calculations necessary, with a different datatype applied to the output, so it should be much more efficient. And that is why the optimized version uses binary strings everywhere. This does help with the efficiency of that step, but not nearly as much as I had hoped.
And this still does nothing for the Triangular nature of string accumulation...
-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung Proactive Performance Solutions, Inc. "Performance is our middle name."
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Wednesday, January 25, 2012 2:29 PM
Points: 1,524,
Visits: 3,981
|
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Thursday, November 17, 2011 4:09 PM
Points: 9,359,
Visits: 8,864
|
|
Garadin (1/25/2009) I get it now. Great explanation. So you've also found that it's quicker to dump repeatedly into the slightly bigger varbinary's than just dumping into one big one whenever you pass a certain number? Exactly Seth. Consider the following simplified examples, first unoptimized:
Row Step Row Step Total Cost Cost Cost 1 a 1 1 @b =''+'A' 1 2 a 2 2 @b='A'+'B' 3 3 a 3 3 @b='AB'+'C' 6 4 a 4 4 @b='ABC'+'D' 10 5 a 5 5 @b='ABCD'+'E' 15 6 a 6 6 @b='ABCDE'+'F' 21 7 a 7 7 @b='ABCDEF'+'G' 28 8 a 8 8 @b='ABCDEFG'+'H' 36
Now a slightly optimized version:
Row Step Row Step Total Cost Cost Cost 1 a 1 1 @b =''+'A' 1 2 a 2 2 @b='A'+'B' 3 3 a 3 3 @b='AB'+'C' 6 4 a 4 @b='ABC'+'D' b 4 @b2=''+'ABCD' c 8 0 @b='' 14 5 a 1 1 @b=''+'E' 15 6 a 2 2 @b='E'+'F' 17 7 a 3 3 @b='EF'+'G' 20 8 a 4 4 @b='EFG'+'H' b 8 @b2='ABCD'+'EFGH' c 12 0 @b='' 32
Now this improvement may seem small (10%), but this is greatly magnified by larger numbers, and then multiplied by stacking them up. The optimizations that I implemented for this improved the overall string performance by 100 to 1000x.
-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung Proactive Performance Solutions, Inc. "Performance is our middle name."
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Thursday, November 17, 2011 4:09 PM
Points: 9,359,
Visits: 8,864
|
|
I should also add, that my previous numbers were incorrect (left a debugging limiter in), the total time is 30 seconds, compared to Jeffs 7+ seconds.
Only 8 seconds are in the string accumulator, the other 22 seconds are (suprisingly) in the row enumerator SELECT. Using varbinary helped some, but not nearly enough and at this point, I cannot figure out what is taking it so long.
Any ideas, Jeff?
-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung Proactive Performance Solutions, Inc. "Performance is our middle name."
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Wednesday, January 25, 2012 2:29 PM
Points: 1,524,
Visits: 3,981
|
|
This is basically gap finding, and it seems like you should be able to use a variation of Jeff's identity gap finding technique for this. I've been trying to adapt it to fit this, but keep having to make too many references to the table to do it. I feel like I'm overlooking something basic.
Seth Phelabaum Consistency is only a virtue if you're not a screwup. 
Links: How to Post Sample Data :: Running Totals :: Tally Table :: Cross Tabs/Pivots :: String Concatenation
|
|
|
|