Help writing a SQL

Question

Help writing a SQL

Viewing 15 posts - 16 through 30 (of 70 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 1

Greg Snidow (9/17/2009)
Ok, this might be a dumb question, but if I specify the physical order of the records on the insert, then why is the index so vital? If I select * from this table a billion times, is there ever a chance that it will return the records in any other order than the physical order, in the absence of an index? Is it that without the index, the update statement does not necessarily run by physical order? Sorry for asking so many questions on this one, but this little baby always intrigues me, and I always look for chances to use it. I can always manage to get it to work by trying everything until it does, but I still only "get it" just a little.

SELECT operates differently than UPDATE does and you cannot trust any natural or index order with SELECT. If you want to guarantee SELECT will return something in the correctly, you MUST use ORDER BY even if the rows were originally inserted using an ORDER BY.

The 3 part Quirky Update, when properly formed with the necessary clustered index and a couple of other things including the prevention of parallelism, will always do an update in the same order as the clustered index even if you fool around with CTE's to try to force an actual ORDER BY.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Greg Snidow SSCoach Points: 16106 More actions · Answer 2

Jeff Moden (9/17/2009)
Another of those rules is that it must be a single table update. By that I mean, no joins... period.

That was going to be my next question. I tried doing it without the temp table, and failed miserably. Now I know. Thanks, Jeff.

Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 3

Heh... I'm actually worse... I just don't let people put CLR's in my systems. It's not that I'm against them... it's that they're not worth having to go through two different languages and maintaining source control for both the source code and the DLL especially since the T-SQL solutions frequently beat them for speed.

Yes, there are a few places where a CLR will beat T-SQL... RegEx replace is one of them and, if done correctly, splitting strings is another. But the differences in performance are so relatively small that I just can't justify the use of CLR's.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dave Ballantyne SSC-Dedicated Points: 33566 More actions · Answer 4

Garadin (9/17/2009)
If anyone reading this knows of a high performance method other than CLR/Quirky Update/Cursor/Loop to do iterative logic like this, I'd love to hear about it.

How about a recursive CTE ? I think this has the correct output , if not let me know.

Also i would be very interested in how this performs against a large record set

using this index

CREATE CLUSTERED INDEX idxmultirecs ON multrecs (acct,ndt)

I see index seeks all the way through so it *should* be pretty damn fast

with cteFirstDates(acct,ndt)

as(

Select acct,min(ndt)

from multrecs

group by acct

)

,

CteRecurs(acct,ndt)

as

(

Select multrecs.acct,multrecs.ndt

from cteFirstDates

join multrecs

on multrecs.acct = cteFirstDates.acct

and multrecs.ndt = cteFirstDates.ndt

union all

Select acct,ndt

from (

Select multrecs.acct,multrecs.ndt,rown =row_number() over (partition by multrecs.acct order by multrecs.ndt)

from CteRecurs

join multrecs

on multrecs.acct = CteRecurs.acct

and multrecs.ndt >= CteRecurs.ndt + 38

) as x

where rown = 1

)

select * from CteRecurs

order by acct,ndt

Clear Sky SQL
My Blog[/url]

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 5

That works but be careful... recursive CTE's are as slow as a While Loop in most cases and are actually a form of RBAR.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dave Ballantyne SSC-Dedicated Points: 33566 More actions · Answer 6

Jeff Moden (9/18/2009)
That works but be careful... recursive CTE's are as slow as a While Loop in most cases and are actually a form of RBAR.

1st off apologies to the OP only just noticed what forum this is in :doze:

I agree that it isnt a perfect set based solution , but any set where you are wanting to include / exclude rows ( and to a certain extent the row data)

based upon data from previous rows will have to be RBAR somewhere.

My 'gut' feeling is it should be light on the system than a quirky update though.

Clear Sky SQL
My Blog[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 7

Dave Ballantyne (9/18/2009)
My 'gut' feeling is it should be light on the system than a quirky update though.

My gut feeling is that you're wrong :-D. But that's what we have testing for. I'd test it, but first I have to make up a couple million rows of suitable test data. Got any scripts for that lying about by chance?

Also, thank you for posting another method for this. Although it is a 2005+ only method, and is technically RBAR, it's one more thing to keep in mind when coming to these iterative logic problems, and is another use(although whether or not it is a 'good' use remains to be seen) for a recursive CTE.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Dave Ballantyne SSC-Dedicated Points: 33566 More actions · Answer 8

Your wish is my command 🙂

Happy to be proved wrong , evidence either way is good in my book.

You just need to manually update the acct value ,

and keep inserting values till happy 😀

also note that this will produce increments of between 1 and 11 days

so adjust your update to be +4 (or something else rather than the specified 38)

Create View vwnewId

as

Select new_id = newId()

go

drop table #datelist

go

Create table #datelist

(

numskip integer,

date datetime

)

go

delete from #datelist

;with ctenums(number)

as(

select number from

master..spt_values

where

TYPE='p' and number between 0 and 400

)

insert into #datelist(numskip)

Select abs(checksum(new_id))%10

from ctenums

cross join vwnewid

go

declare @Date datetime

Select @Date = '01jan1970'

update #datelist

set Date=@Date,

@Date = @Date+numskip+1

delete from multrecs where acct = 3

insert into multrecs (acct,cdt,val,ndt)

select 3,'01jan1970','test',date

from #datelist

Clear Sky SQL
My Blog[/url]

Dave Ballantyne SSC-Dedicated Points: 33566 More actions · Answer 9

Also, with the quirk update , although you have specified index(0) are the rows guaranteed to come back in the required order ? . If you joined out to another table , things could go wrong.

(No criticism , just observation) 🙂

Clear Sky SQL
My Blog[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 10

Dave Ballantyne (9/18/2009)
Also, with the quirk update , although you have specified index(0) are the rows guaranteed to come back in the required order ? . If you joined out to another table , things could go wrong.
(No criticism , just observation) 🙂

There's a few posts earlier in the thread from Jeff directly explaining the index(0) thing and why we won't use it anymore, but the gist of it is that you are guaranteed the row order (assuming you've used the clustered index to create this order) on a single table.

You're absolutely right that Joins present a problem, and AFAIK, you cannot use joins at all. I don't believe you can use a WHERE clause either, but I'm not sure on that. The logic for these updates should be in the case statement.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 11

Dave Ballantyne (9/18/2009)
Your wish is my command 🙂
Happy to be proved wrong , evidence either way is good in my book.
You just need to manually update the acct value ,
and keep inserting values till happy 😀
also note that this will produce increments of between 1 and 11 days
so adjust your update to be +4 (or something else rather than the specified 38)
Create View vwnewId
as
Select new_id = newId()
go
drop table #datelist
go
Create table #datelist
(
numskip integer,
date datetime
)
go
delete from #datelist
;with ctenums(number)
as(
select number from
master..spt_values
where
TYPE='p' and number between 0 and 400
)
insert into #datelist(numskip)
Select abs(checksum(new_id))%10
from ctenums
cross join vwnewid
go
declare @Date datetime
Select @Date = '01jan1970'
update #datelist
set Date=@Date,
@Date = @Date+numskip+1
delete from multrecs where acct = 3
insert into multrecs (acct,cdt,val,ndt)
select 3,'01jan1970','test',date
from #datelist

Awesome, I appreciate the code. Let the battle of the guts commence!

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 12

And the results are in. It probably wasn't a fair contest to begin with. You were going up against a gut accustomed to eating 2 day old Taco Bell burritos without flinching.

Quirky Method No DOP set, Messages:

(2000000 row(s) affected)

(1 row(s) affected)

Update Done. Time Elapsed: 28 seconds

(64202 row(s) affected)

(1 row(s) affected)

Select to Table Done. Total Time Elapsed: 30 seconds

(1 row(s) affected)

Final Row Count, Quirky Method: 64202

Quirky Method MAXDOP 1, Messages:

(2000000 row(s) affected)

(1 row(s) affected)

Update Done. Time Elapsed: 12 seconds

(64202 row(s) affected)

(1 row(s) affected)

Select to Table Done. Total Time Elapsed: 15 seconds

(1 row(s) affected)

Final Row Count, Quirky Method:64202

CTE Method Messages:

(71705 row(s) affected)

(1 row(s) affected)

Select to Table Done. Total Time Elapsed: 170 seconds

(1 row(s) affected)

Final Row Count, CTE Method: 71705

Notes:

1. I simplified your test data script a bit. I created 2 Million rows of test data.

2. In the event that there are two items with exactly the same date that meet the criteria, the Recursive CTE solution includes both, where as the Quirky update does not.

3. I picked a dumb name for the Clustered index (Quirky), because it's used by both methods. Because I'm including the .sqlplans, I didn't rename it post testing.

4. I select these into tables to take screen display times out of the equation.

5. I haven't completely researched the difference in rows returned. I'm sure some is due to the duplicates, but I'm not sure it all is.

6. It's possible that both methods could be optimized further. The Quirky update actually took a lot longer than I expected it to, which probably means I did something wrong.

7. I forgot the MAXDOP the first time around. The second time I added it and re-ran. 15 seconds instead of 30.

Test Data Setup

IF OBJECT_ID('multrecs','u') IS NOT NULL DROP TABLE multrecs

CREATE TABLE multrecs(

acctint,

ndtdatetime

)

DECLARE @i int,

@Acct int

SET @i = 0

WHILE @i 400 records due to duplication.

INSERT INTO multrecs(acct,ndt)

SELECT @Acct, DATEADD(D,ABS(CHECKSUM(NEWID())%10000),'01/01/1970') -- Generates a number 0-10000 and adds this number of days to 1970. This is unsequenced data.

FROM Tally

WHERE N <=400

SET @i = @i + 1

CONTINUE

END

ALTER TABLE multrecs ADD display bit

CREATE CLUSTERED INDEX Quirky ON multrecs (acct, ndt) -- Silly Name for the index.

Quirky Update Code Used:

-- Stop and Start SQL Server Service to completely clear memory.

DECLARE @PrevAcctbigint,

@LDNDT datetime, -- Last Displayed NDT

@Display bit, -- Even though you don't use it, this method won't function without the 3 part update

@timerdatetime,

@FRCint

SET @timer = GETDATE()

UPDATE multrecs

SET @Display = Display = CASE WHEN Acct = @PrevAcct

AND DATEDIFF(d,@LDNDT,NDT) < 150

THEN 0

ELSE 1

END,

@LDNDT =CASE WHEN Acct = @PrevAcct

AND DATEDIFF(d,@LDNDT,NDT) = CteRecurs.ndt + 150

) as x

where rown = 1

)

select *

INTO CTE

from CteRecurs

order by acct,ndt

PRINT 'Select to Table Done. Total Time Elapsed: ' + CAST(DATEDIFF(s,@Timer,GETDATE()) AS varchar(10)) + ' seconds'

SET @FRC = (SELECT COUNT(*) FROM CTE)

PRINT 'Final Row Count, CTE Method:' + CAST(@FRC AS varchar(10))

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 13

Taking this one step further, if anyone wants to supply the script for a CLR function(and possibly explain to me how to use it) or any other alternate method they think might beat this, I'd love to test them.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 14

Turn off the execution plan and see what you get.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Garadin One Orange Chip Points: 29613 More actions · Answer 15

Jeff Moden (9/18/2009)
Turn off the execution plan and see what you get.

Without the execution plan, 12 seconds for quirky with MAXDOP 1, 151 for the RCTE. I ran them a few times each with similar results. I also ran them without stopping/starting the server, which actually made almost no difference, probably because this is a workstation with only like 2 gigs of memory.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]