Comparing multiple values with column value

Question

Post reply

Comparing multiple values with column value

PiMané

SSChampion

Points: 13965
More actions
September 2, 2009 at 11:45 am

#214080

Hi,
I'm trying to make a store to return some row's data given some public ids (string).
I've come across 3 methods to do so but I'm having some trouble finding out witch is the best.
declare @ads nvarchar(100) = ',1nohFsEuMcy,1nohFsEvaJq,1nohFsEvBnk,'
declare @adsx xml = '1nohFsEuMcy1nohFsEvaJq1nohFsEvBnk'
declare @i int
select id, pubKey from xpt where charindex(','+pubkey+',', @ads) > 0
select id, pubKey from xpt where exists (select 1 FROM @adsx.nodes('/r/ad') t(n) where pubkey = t.n.value('(.)[1]', 'NVARCHAR(20)') )
exec sp_xml_preparedocument @i output, @adsx
select id, pubKey from xpt where exists (select 1 from (select pk from openxml(@i, '/r/ad', 3) with (pk nvarchar(20) '.')) t where pubKey=t.pk )
exec sp_xml_removedocument @i
Can any one help me choose witch method is best?
Thanks,
Pedro

If you need to work better, try working less...

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

I'd go with this:

select id, pubKey

from xpt

inner join

(select ADSX.Nodes.value('(.)[1]','varchar(100)') as ADSXValue

from @adsx.nodes('/R/AD') ADSX(Nodes)) Sub

on xpt.pubKey = Sub.ADSXValue;

Edit: Actually, I'd probably do the derived table as a CTE, since those are easier to read, but my test ended up formatted this way.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

PiMané SSChampion Points: 13965 More actions · Answer 2

I've also noticed, that since I know the maximum number of rows I'll return, that using the TOP N the SQL has less logical reads since it stops after having N records and doesn't continue searching.

But to do this I'd have to use the nodes() function twice... Once to make a COUNT() into a variable to use with the SELECT TOP (@variable) and another in the SELECT.

Would it be better, to read the values the first time into a TABLE VARIABLE and use that TABLE variable instead of the nodes()?

set statistics io on

declare @adsx xml = '1nohFsEuMcy1nohFsEvaJq1nohFsEvBnk'

declare @ids table (pkid nvarchar(20))

insert into @ids select t.n.value('(.)[1]', 'NVARCHAR(20)') FROM @adsx.nodes('/r/ad') t(n)

declare @cnt int = (select COUNT(1) from @ids)

select top (@cnt) id, pubKey from xpt inner join @ids on pubkey = pkid

If you need to work better, try working less...

GSquared SSC Guru Points: 260824 More actions · Answer 3

Adding top to this just adds processor cycles without accomplishing anything.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

PiMané SSChampion Points: 13965 More actions · Answer 4

I made some tests and notice that... since is made a JOIN with a select and not used the IN clause the number of logical reads is lower.

select idad, publickey from adsmain inner join (select t.n.value('(.)[1]', 'NVARCHAR(20)') pk FROM @adsx.nodes('/r/ad') t(n)) t on publickey = pk

Table 'AdsMain'. Scan count 3, logical reads 9, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

select idad, publickey from adsmain where publickey in (select t.n.value('(.)[1]', 'NVARCHAR(20)') pk FROM @adsx.nodes('/r/ad') t(n))

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'AdsMain'. Scan count 5, logical reads 1538, physical reads 13, read-ahead reads 821, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

much lower 😀

I executed the following code and asked for the Execution Plan:

declare @ads nvarchar(100) = ',1nohFsEuMcy,1nohFsEvaJq,1nohFsEvBnk,'

declare @adsx xml = '1nohFsEuMcy1nohFsEvaJq1nohFsEvBnk'

select idad, publickey from adsmain where charindex(','+publickey+',', @ads) > 0

select idad, publickey from adsmain inner join (select t.n.value('(.)[1]', 'NVARCHAR(20)') pk FROM @adsx.nodes('/R/AD') t(n)) t on publickey = pk

Do you know why SQL Server says that the 1st query only cost 1% of the query batch and the 2nd 99%?

The 2nd one is best and still SQL Server gives this numbers out!! Is the nodes() function still a mistery to the tunning engine?!

Thanks in advance,

Pedro

If you need to work better, try working less...

GSquared SSC Guru Points: 260824 More actions · Answer 5

Execution plans costs and percentages are misleading. They are quite often just plain wrong. Personally, I wish MS would get rid of them, because they cause more problems than they solve.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon