Optimizing This Query

Question

Post reply

Optimizing This Query

schwizla

SSCrazy

Points: 2668
More actions
December 10, 2010 at 10:44 am

#388415

Hi I was hoping a TSQL guru can make a recommendation for optimizing this query
SELECT top 50
CA1.ConfigID, CA1.DownloadTime, CA1.Config,
N.NodeID, N.AgentIP, N.ReverseDNS, N.SysName, N.SysLocation,
CA2.ConfigID, CA2.DownloadTime, CA2.Config
FROM [OrionNCM_ConfigMgmt].[dbo].[ConfigArchive] CA1
inner join Nodes N on N.NodeID = substring(CA1.NodeID,2,len(CA1.NodeID)-2)
inner join ConfigArchive CA2 on CA2.NodeID = CA1.NodeID and CA2.ConfigID =
(select top 1 CA3.ConfigID
from ConfigArchive CA3
where CA3.NodeID = CA1.NodeID and CA3.DownloadTime < CA1.DownloadTime and CA3.ConfigType != 'Snippet'
order by CA3.DownloadTime desc)
where CA1.ConfigType != 'Snippet' and CA2.ConfigType != 'Snippet'
and not (cast(CA1.Config as nvarchar(max)) = cast(CA2.Config as nvarchar(max)))
order by CA1.DownloadTime desc, N.ReverseDNS asc
It joins on the same table 3 times and although the ConfigArchive table has only 4000 rows it is already highly inefficient 🙁
This table monitors changes to the configuration of a Node (represented by NodeID) and in one select statement it returns the current config (CA1) and the config prior to the change (CA2).
The Table schema for ConfigArchive is as follows. I haven't bothered with the Schema for Nodes table as its not relevant. Also please ignore the SUBSTRING in the join above as the NodeID does not match in both tables and hence some tweaking is required.
CREATE TABLE [dbo].[ConfigArchive](
[ConfigID] [uniqueidentifier] NOT NULL,
[BaseConfigID] [varchar](100) NOT NULL,
[NodeID] [varchar](60) NOT NULL,
[ConfigTitle] [varchar](200) NOT NULL,
[DownloadTime] [datetime] NOT NULL,
[AttemptedDownloadTime] [datetime] NULL,
[ModifiedTime] [datetime] NOT NULL,
[ConfigType] [varchar](50) NOT NULL,
[Config] [text] NULL,
[Comments] [text] NULL,
[Baseline] [bit] NOT NULL,
CONSTRAINT [PK_ConfigArchive] PRIMARY KEY CLUSTERED
(
[ConfigID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [dta_IDX_NodeID_DownloadTime_ConfigID_ConfigType] ON [dbo].[ConfigArchive]
(
[NodeID] ASC,
[DownloadTime] ASC,
[ConfigID] ASC,
[ConfigType] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
I have tried to use a CTE to generate some of the query in order to break it down but I am failing miserably (and the weekend is upon us so brain's stopped functioning!). I can upload the the execution plan but I think no matter what I do with indexes, the major improvement would be gained by re-engineering the query.
You input would be greatly appreciated. Thanks in advance

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

A couple of things I can see offhand:

dibbydibby (12/10/2010)
inner join Nodes N on N.NodeID = substring(CA1.NodeID,2,len(CA1.NodeID)-2)

This destroys the sargability against ConfigArchive1, forcing a scan and scalar before anything is done.

inner join ConfigArchive CA2 on CA2.NodeID = CA1.NodeID and CA2.ConfigID =
(select top 1 CA3.ConfigID
from ConfigArchive CA3
where CA3.NodeID = CA1.NodeID and CA3.DownloadTime < CA1.DownloadTime and CA3.ConfigType != 'Snippet'
order by CA3.DownloadTime desc)

This is an execution plan question. This, I believe, runs on a per row basis, from some intuition on the data. Painful, at the least.

where
CA1.ConfigType != 'Snippet' and CA2.ConfigType != 'Snippet'

What's your percentage of Snippet to everything else?

and not (cast(CA1.Config as nvarchar(max)) = cast(CA2.Config as nvarchar(max)))

Splat, there went ANY chance of SARGability... and why the varchar(max)? Are these fields truly that large? You might be best off storing a hashed value of the column and comparing the hash, simply for speed.

order by CA1.DownloadTime desc, N.ReverseDNS asc

[/code]

No indexing will help here with the number of disconnects from the indexes that are done, so this is just going to be stuck as cost of business.

I have tried to use a CTE to generate some of the query in order to break it down but I am failing miserably (and the weekend is upon us so brain's stopped functioning!). I can upload the the execution plan but I think no matter what I do with indexes, the major improvement would be gained by re-engineering the query.
You input would be greatly appreciated. Thanks in advance

A CTE only creates a neat subquery, it does no preprocessing for you. For that you need a #tmp or @tmp.

A re-engineering can be looked into, but we'd need full DDL for the tables, and that .sqlplan is handy at all times. If you've got it handy, please post it.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 2

Sighs.

CELKO (12/10/2010)
There is a lot of dialect in this code. SQL uses <> instead of <>,

Typo, he meant <> instead of !=, and he's right.

TOP(n) is proprietary, and you have BIT flags like assembly language, and UNIQUEIDENTIFIER instead of valid key.

Fancy that. We're on a SQL Server board, not ANSI-sql board. Imagine someone using something proprietary. We're just pathetic.

You never did get back to me again about why BIT flags matter other than you've got a personal vendetta against them. Everything indicates to BITs being a better performer than INT for each flag. Explain, with code and examples, what removing them benefits anything.

UNIQUEIDENTIFIER is valid, if not required, in a few instances, and is just as valid as a surrogate key instead of using a multicolumn key.

Why would anyone use NVARCHAR(MAX) in production code? You cannot optimize with it and someone might fill it up.

Because the information you're getting from said config files is over 8000 characters? Yep, someone might fill it up, that's kinda the point.

Why would hide a data element inside a string, then have to pull it out? A node_id should be a node_id everywhere and represented the same way.

Crappy vendor software, Microsoft inconsistencies, and any number of other options. Surrogate keys or not surrogate keys. You can't have it both ways, Celko. Though in this case I agree with a single key to rule them all and a table to lookup the key based on where it's coming from during inclusion to the system.

Should there be FK to the Nodes table from ConfigArchive, so you know every node in ConfigArchive has to be in

(continued)... in the other tables... Yes. After perhaps some data modifications to the process in general, that might be useful.

You use TOP(n).. ORDER BY because your mindset is still in sorted sequential files; SQL programmers would use MAX()/MIN(), an abstract set property not dependent on sorting.

Or you might use it because MIN/MAX doesn't directly go to the MAX (skip the first 49 items), and you get a variable # of rows against an expected pagination even if you do come up with a nice clean workaround.

Although, if you don't mind, I'd love to see how to grab only the top 50 rows via a MAX statement without 49 nested maxes inside of it. ROW_NUMBER() or RANK() could be used to produce a similar style result, with a where clause against that instead.

Why did you write "NOT (<exper1> = <exper2>) instead of the more natural <>?

Can't pick on Celko for this one, I'm kinda curious myself.

UNIQUEIDENTIFIER cannot be key by definition. My guess is that (base_config_id, config_id) is the real key. Assuming they really are identifiers.

:blink: Wha? Primary Key: Unique row identifier for a table. UniqueIdentifier: A unique Identifier. Produced uniquely for each row... it works just fine as the key.

<Snip: A useful table statement unassociated with your immediate issue, until you care about time between config changes or point in time systemwide checks>
Experienced SQL programmers do not use the infixed notation -- that is mostly ACCESS people or newbies who cannot think in sets yet. Again, this is scratch paper work, but your query should have a shape more like this:

http://en.wikipedia.org/wiki/Infix_notation Well, at least I learn one thing today.

Care to show me where the math is using a 2+2 structure is in this query, instead of +2 2 or 2 2+?

Dunno about you, but I use infix pretty much all the time I have to use math. Hm, though, I may not be experienced to you.

SELECT CA1.config_id, CA1.download_time, CA1.config, N.node_id, N.agent_ip,
N.reverse_dns, N.sys_name, N.sys_location,
CA2.config_id, CA2.download_time, CA2.config, CA3.config_type)
FROM Nodes AS N,
ConfigArchive AS CA1,
ConfigArchive AS CA2,
ConfigArchive AS CA3

Because the ON clause is the enemy and we LOVE to hide the join method vs. data restrictors... Sigh...

WHERE 'snippet' NOT IN (CA1.config_type, CA2.config_type, CA3.config_type)
<SNIP>
Notice that separate, sequential binary tests are replaced by an n-way "'snippet' NOT IN (CA1.config_type, CA2.config_type, CA3.config_type)" instead. That is thinking in sets and using the principle of similarity.

Now, this was interesting. Even though the readibility improves as Celko mentioned, I had expected this to change the index accessing. However, the engine's smarter than I thought here when evaluating the Execution Plan. Check out this test code if you're curious, I stripped the question down to specifically testing the NOT IN () specifically:

/*

CREATE TABLE CelkoTest1 ( KeyField INT PRIMARY KEY IDENTITY( 1, 1), Config_Type VARCHAR(20), Attribute1 VARCHAR(500))

CREATE TABLE CelkoTest2 ( KeyField INT PRIMARY KEY IDENTITY( 1, 1), Config_Type VARCHAR(20), Attribute1 VARCHAR(500))

CREATE TABLE CelkoTest3 ( KeyField INT PRIMARY KEY IDENTITY( 1, 1), Config_Type VARCHAR(20), Attribute1 VARCHAR(500))

INSERT INTO CelkoTest1 (Config_Type, Attribute1)

SELECTCASE WHEN t1.N%10 = 0 THEN 'Snippet' ELSE 'NotASnippet' END AS Config_Type,

NEWID()

FROM

tempdb..Tally AS t1,

tempdb..Tally AS t2

WHERE

t1.N <= 25000 AND

t2.N <= 20

INSERT INTO CelkoTest2 (Config_Type, Attribute1)

SELECTCASE WHEN t1.N%30 = 0 THEN 'Snippet' ELSE 'NotASnippet' END AS Config_Type,

NEWID()

FROM

tempdb..Tally AS t1,

tempdb..Tally AS t2

WHERE

t1.N <= 25000 AND

t2.N <= 20

INSERT INTO CelkoTest3 (Config_Type, Attribute1)

SELECTCASE WHEN t1.N%5 = 0 THEN 'Snippet' ELSE 'NotASnippet' END AS Config_Type,

NEWID()

FROM

tempdb..Tally AS t1,

tempdb..Tally AS t2

WHERE

t1.N <= 25000 AND

t2.N <= 20

CREATE NONCLUSTERED INDEX idx_1 ON CelkoTest1 ( Config_Type) INCLUDE (Attribute1)

CREATE NONCLUSTERED INDEX idx_1 ON CelkoTest2 ( Config_Type) INCLUDE (Attribute1)

CREATE NONCLUSTERED INDEX idx_1 ON CelkoTest3 ( Config_Type) INCLUDE (Attribute1)

*/

SELECT Attribute1 FROM CelkoTest1 WHERE Config_Type = 'Snippet'

SELECT Attribute1 FROM CelkoTest2 WHERE Config_Type = 'Snippet'

SELECT Attribute1 FROM CelkoTest3 WHERE Config_Type = 'Snippet'

SELECT Attribute1 FROM CelkoTest1 WHERE Config_Type <> 'Snippet'

SELECT Attribute1 FROM CelkoTest2 WHERE Config_Type <> 'Snippet'

SELECT Attribute1 FROM CelkoTest3 WHERE Config_Type <> 'Snippet'

SELECT

c1.Attribute1,

c2.Attribute1,

c3.Attribute1

FROM

CelkoTest1 AS c1

JOIN

CelkoTest2 AS c2

ONc1.Keyfield = c2.KeyField

JOIN

CelkoTest3 AS c3

ONc2.KeyField = c3.KeyField

WHERE

c1.Config_Type <> 'Snippet'

AND c2.Config_Type <> 'Snippet'

AND c3.Config_Type <> 'Snippet'

SELECT

c1.Attribute1,

c2.Attribute1,

c3.Attribute1

FROM

CelkoTest1 AS c1

JOIN

CelkoTest2 AS c2

ONc1.Keyfield = c2.KeyField

JOIN

CelkoTest3 AS c3

ONc2.KeyField = c3.KeyField

WHERE

'Snippet' NOT IN ( c1.Config_Type, c2.Config_Type, c3.Config_Type)

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 3

The source table is small (at this point, assume it will grow) so I'd do some preprocessing into a temp table then query from that, something like:

SELECT seq = ROW_NUMBER() OVER (PARTITION BY NodeID ORDER BY DownloadTime DESC),

ConfigID,

DownloadTime,

Config = cast(Config as nvarchar(max)),

N_NodeID = substring(NodeID,2,len(NodeID)-2)

INTO #ConfigArchive

FROM [OrionNCM_ConfigMgmt].[dbo].[ConfigArchive]

WHERE ConfigType <> 'Snippet'

SELECT top 50

CA1.ConfigID,

CA1.DownloadTime,

CA1.Config,

N.NodeID,

N.AgentIP,

N.ReverseDNS,

N.SysName,

N.SysLocation,

CA2.ConfigID,

CA2.DownloadTime,

CA2.Config

FROM #ConfigArchive CA1

inner join Nodes N on N.NodeID = CA1.N_NodeID

inner join #ConfigArchive CA2 on CA2.NodeID = CA1.NodeID and CA2.seq = 2

where CA1.Config <> CA2.Config

order by CA1.DownloadTime desc, N.ReverseDNS asc

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

schwizla SSCrazy Points: 2668 More actions · Answer 4

Thanks for you responses so far. Heres my response to some of the queries.

Why would hide a data element inside a string, then have to pull it out? A node_id should be a node_id everywhere and represented the same way.
------
Crappy vendor software, .....

* Thats correct, its third party code so I cant to nothing with the design unfortunately 🙁

Why did you write "NOT (<exper1> = <exper2>) instead of the more natural <>?

* point taken though seeing little difference

Why would anyone use NVARCHAR(MAX) in production code? You cannot optimize with it and someone might fill it up.
------
Because the information you're getting from said config files is over 8000 characters? Yep, someone might fill it up, that's kinda the point.

* Thats correct its a massive amount of text data

What's your percentage of Snippet to everything else?

* Very low, three records out of 4500

inner join ConfigArchive CA2 on CA2.NodeID = CA1.NodeID and CA2.ConfigID =
(select top 1 CA3.ConfigID
from ConfigArchive CA3
where CA3.NodeID = CA1.NodeID and CA3.DownloadTime < CA1.DownloadTime and CA3.ConfigType != 'Snippet'
order by CA3.DownloadTime desc)

* This above bit of code is the killer

and not (cast(CA1.Config as nvarchar(max)) = cast(CA2.Config as nvarchar(max)))
------
Splat, there went ANY chance of SARGability... and why the varchar(max)? Are these fields truly that large? You might be best off storing a hashed value of the column and comparing the hash, simply for speed

I removed all the CAST and replaced with CA1.Config = CA2.Config and I got the error as follows. Can you tell me how I can store as a hashed value?

Msg 402, Level 16, State 1, Line 13

The data types text and text are incompatible in the not equal to operator.

ChrisM@home, I tried your query but it didnt return me the same resultset. Also it was reading the whole table contents into a tmp table and that in itself is quite costly especially if the table grows very large. I also tried a CTE to partition on NodeID and then then select the top 2 by date to display before and after changes. The problems was in the last 50 changes the could be more than two changes to the same NodeID so I was missing any changes after the first two. If I could popluate a very fitered down tmp table and then query off that it would be great.

As requested here are the other DDLs (just for the Nodes tables as other are given in my first post)

CREATE TABLE [dbo].[Nodes](

[NodeID] [uniqueidentifier] NOT NULL,

[NodeCaption] [varchar](255) NOT NULL,

[NodeGroup] [varchar](200) NOT NULL,

[AgentIP] [varchar](50) NOT NULL,

[AgentIPSort] [float] NOT NULL,

[ReverseDNS] [varchar](50) NOT NULL,

[ResponseTime] [int] NOT NULL,

[ResponseError] [varchar](100) NOT NULL,

[Status] [tinyint] NOT NULL,

[Community] [varchar](150) NOT NULL,

[CommunityReadWrite] [varchar](150) NOT NULL,

[SNMPLevel] [tinyint] NOT NULL,

[SysName] [varchar](255) NOT NULL,

[SysDescr] [text] NOT NULL,

[SysContact] [varchar](255) NOT NULL,

[SysLocation] [varchar](255) NOT NULL,

[SystemOID] [varchar](255) NOT NULL,

[Vendor] [varchar](255) NOT NULL,

[VendorIcon] [varchar](50) NOT NULL,

[MachineType] [varchar](255) NOT NULL,

[LastBoot] [datetime] NOT NULL,

[OSImage] [varchar](255) NOT NULL,

[OSVersion] [varchar](255) NOT NULL,

[ConfigTypes] [varchar](255) NOT NULL,

[NodeComments] [text] NOT NULL,

[NextDiscovery] [datetime] NOT NULL,

[NextPoll] [datetime] NOT NULL,

[Username] [varchar](250) NOT NULL,

[Password] [varchar](250) NOT NULL,

[EnableLevel] [varchar](50) NOT NULL,

[EnablePassword] [varchar](250) NOT NULL,

[DeviceTemplate] [varchar](100) NOT NULL,

[ExecProtocol] [varchar](50) NOT NULL,

[CommandProtocol] [varchar](50) NOT NULL,

[TransferProtocol] [varchar](50) NOT NULL,

[LoginStatus] [varchar](200) NOT NULL,

[UseHTTPS] [bit] NOT NULL,

[LastUpdateTime] [datetime] NOT NULL,

[LastRediscoveryTime] [datetime] NOT NULL,

[UseUserDeviceCredentials] [bit] NOT NULL,

[LastInventory] [datetime] NOT NULL,

[SNMPContext] [varchar](250) NOT NULL,

[SNMPUsername] [varchar](250) NOT NULL,

[TelnetPort] [varchar](50) NOT NULL,

[SSHPort] [varchar](50) NOT NULL,

[SNMPPort] [varchar](50) NOT NULL,

[SNMPAuthType] [varchar](50) NOT NULL,

[SNMPAuthPass] [varchar](250) NOT NULL,

[SNMPEncryptType] [varchar](50) NOT NULL,

[SNMPEncryptPass] [varchar](250) NOT NULL,

[SNMPStatus] [varchar](250) NOT NULL,

[AllowIntermediary] [bit] NOT NULL,

[EnableOrionImport] [bit] NOT NULL,

[AgentIPv6] [varchar](50) NOT NULL,

[ManagedProtocol] [tinyint] NOT NULL,

[LastDiscovery] [datetime] NULL,

[FirstDiscovery] [datetime] NULL,

[Missing] [bit] NULL,

[GroupingLocation] [varchar](200) NULL,

CONSTRAINT [PK_Nodes] PRIMARY KEY CLUSTERED

(

[NodeID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

ALTER TABLE [dbo].[Nodes] ADD DEFAULT (newid()) FOR [NodeID]

I have attached the Execution plan. I had to ZIP it as company Security Policy was preventing me from uploading the .SQL

Thanks All

GSquared SSC Guru Points: 260824 More actions · Answer 5

You won't be able to use a checksum on text.

Almost certainly, the biggest performance hit in this is the inequality comparison between two text columns. (Should have called them "fields" just to see if I could get a reaction from Joe.) Converting them inline to varchar(max) is necessary for that, but is probably also killing performance all by itself.

What kind of data goes into that column? Is there ANY other column in the table that could be used to solve the inequality check (a file name or something like that maybe)?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 6

CELKO (12/13/2010)
>> Fancy that. We're on a SQL Server board, not ANSI-SQL Board. Imagine someone using something proprietary. We're just pathetic. <<
Why is it a good idea to use proprietary syntax when the product supports ANSI/ISO syntax? Which will be deprecated in the future? The only reasons I can see for needlessly proprietary code are ignorance and job security.

Which brings us back to:

>> Or you might use it [TOP 1] because MIN/MAX doesn't directly go to the MAX (skip the first 49 items) .. <<
MIN and MAX are often part of the stats on a column, or easy to search in an index.

Stats, yes. Search for, yes. Find the 50th MAX item. No. Thus, proprietary perhaps, but necessary.

I hope you know that UNIQUEIDENTIFIER cannot be a key because it is not an attribute of an entity. You might want to look up Dr. Codd's definition of a surrogate key. It is hidden from the users and totally maintained by the DB engine. But you have to create and maintain the UNIQUEIDENTIFIER by hand.

I assume you're familiar with the NEWID() function that removes that necessity? Though, I assume it's not ANSI/ISO (which I'm not dropping $700 on a copy to check), so maybe you're not familiar. It produces a GUID for consumption in, and by, the database... without user interference, knowledge, or interaction.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

david_wendelken SSCrazy Points: 2388 More actions · Answer 7

As you said, this appears to be a problem. Wouldn't it issue this correlated subquery once per row retrieved?

inner join ConfigArchive CA2 on CA2.NodeID = CA1.NodeID and CA2.ConfigID =
(select top 1 CA3.ConfigID
from ConfigArchive CA3
where CA3.NodeID = CA1.NodeID and CA3.DownloadTime < CA1.DownloadTime and CA3.ConfigType != 'Snippet'
order by CA3.DownloadTime desc)

If you can restructure it so that this is only executed once per query, instead of once per row, you should get much better performance.

It appears you want to join to the records with the latest DownloadTime for ca2.NodeID and ca2.ConfigID pairs, ignoring all others in the join?

Would something along this line work? (This is not tested, just a conceptual approach I would follow to find the right statement. Might have to discard it, too! 🙂 )

I'm assuming a given NodeId can't have two records with the same DownloadTime. If that's a false assumption, my approach would be different.

select

...

from ConfigArchive ca1

inner join

(select ca3.NodeId, max(ca3.DownloadTime) MaxDownloadTime

from ConfigArchive ca3

group by ca3.NodeId

) ca2

on ca1.NodeId = ca2.NodeId

inner join ConfigArchive ca4

on ca3.NodeId = ca4.NodeId

and ca3.MaxDownLoadTime = ca4.DownloadTime

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 8

GSquared (12/13/2010)
You won't be able to use a checksum on text.

Hmm. Guess I shoulda double checked the older format. It will work for VARCHAR(MAX), however. Yet another reason to convert it over, besides the inline conversion.

create table texttester (ttID INT IDENTITY( 1, 1), ttText TEXT, ttVCM VARCHAR(MAX))

INSERT INTO textTester ( ttVCM) VALUES ('sdlfjasdlkfjasldfjlsajfd;lksafkljashdfkljsahdfkljsakl;dfjasl;j;')

UPDATE TextTester SET ttText = ttVCM

--This fails

SELECT

CHECKSUM( ttText)

FROM

TextTester

--This works

SELECT

CHECKSUM( ttVCM)

FROM

TextTester

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

GSquared SSC Guru Points: 260824 More actions · Answer 9

Craig Farrell (12/13/2010)
GSquared (12/13/2010)
You won't be able to use a checksum on text.
Hmm. Guess I shoulda double checked the older format. It will work for VARCHAR(MAX), however. Yet another reason to convert it over, besides the inline conversion.
create table texttester (ttID INT IDENTITY( 1, 1), ttText TEXT, ttVCM VARCHAR(MAX))
INSERT INTO textTester ( ttVCM) VALUES ('sdlfjasdlkfjasldfjlsajfd;lksafkljashdfkljsahdfkljsakl;dfjasl;j;')
UPDATE TextTester SET ttText = ttVCM
--This fails
SELECT
CHECKSUM( ttText)
FROM
TextTester
--This works
SELECT
CHECKSUM( ttVCM)
FROM
TextTester

Yep. The documentation for Checksum specifically mentions that it can't be used on the noncomparable data types, which are text, ntext, and image. So, can't checksum text for the same reason you can't check equality on it.

But, since the table can't be refactored, per one of the posts, converting to varchar(max) at that level, which is what would be needed, isn't an option. Unless the table can be refactored to use the more current data type (won't work if the application uses the older text modification methods). I'd check with the vendor on that point.

All of that is why I asked if any other column or combination of columns could be used to identify the rows more effectively, for that inequality test. If there's any other way to do that besides converting text to varchar(max) and then doing a direct inequality test, it will almost certainly speed this whole thing up substantially.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

dibbydibby (12/13/2010)
ChrisM@home, I tried your query but it didnt return me the same resultset. Also it was reading the whole table contents into a tmp table and that in itself is quite costly especially if the table grows very large.

Not necessarily. Here at work we have a process which aggregates ~50k rows (selected from a table with ~70m) down to ~50 rows, and it does it in about a second - using a temp table to harvest and preaggregate the keys and final aggregate columns. The most expensive part in your process would be the text columns - which you have to read in order to convert.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden