Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


108,000,000 row table - check for matches


108,000,000 row table - check for matches

Author
Message
duncanburtenshaw
duncanburtenshaw
SSC Rookie
SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)

Group: General Forum Members
Points: 49 Visits: 646
Good afternoon,

Long time reader - first time poster here!

I am currently working on a 108m row table where some of the ids (TOID column) have inadvertently been shortened - they should all be 16chars in length, i.e. 0001000000076579
0001000000104566
0001000000159225
0001000000250870
0001000000263722
0001000000303305
0001000000303309

but some are 13 in length. I think that some are missing a 100 at the beginning and some are missing 500. I have an update table (where I have to find matches between this table and the big daddy table based on this id (called TOID))

I need to find out which rows in the update table don't match rows in the big daddy table - I can join on the matches, no problem, but when it comes to searching for those matches which are a concatenation of 100/500<TOID> this makes the query run very very slowly.

my code is quite simply:


UPDATE DUNCANTEMP
SET
DOESEXIST = 1
WHERE
EXISTS (SELECT TOID
FROM
[OS_MasterMap].dbo.OS_MM_AREA OMA WITH (NOLOCK)
WHERE
('500' + OMA.Toid = DUNCANTEMP.TOID
OR
'100' + OMA.Toid = DUNCANTEMP.TOID)
AND
len(OMA.Toid) = 13)

Where DUNCANTEMP consists of:

CREATE TABLE dbo.DUNCANTEMP(
TOID VARCHAR(50) NULL,
DOESEXIST INT NULL
) ON [PRIMARY]

The tables area static and in a dev environment.

And [OS_MasterMap].dbo.OS_MM_AREA has loads of columns, but TOID is the only one we need to look at:

CREATE TABLE dbo.OS_MM_AREA(
OBJECTID INT IDENTITY,
SHAPE GEOMETRY NOT NULL,
Toid NVARCHAR(20) NOT NULL,
Version INT NULL,
VerDate DATETIME NULL,
FeatCode INT NULL,
Theme NVARCHAR(80) NULL,
CalcArea NUMERIC(38, 8) NULL,
Change NVARCHAR(80) NULL,
DescGroup NVARCHAR(150) NULL,
DescTerm NVARCHAR(150) NULL,
Make NVARCHAR(20) NULL,
PhysLevel INT NULL,
PhysPres NVARCHAR(20) NULL,
Broken SMALLINT NULL,
LoadDate DATETIME NULL,
CONSTRAINT PK__OS_MM_AR__F4B70D851D5924B0 PRIMARY KEY (OBJECTID),
CONSTRAINT UQ__OS_MM_AR__FF378992EC7F35C5 UNIQUE (Toid)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

There is a unique constraint on OS_MM_AREA (TOID) and a non-clustered index.

there is a clustered index on TOID on the DUNCANTEMP table (but not unique as there may be some duplicate TOIDs based on multiple update rows).

My question is: my code has been running for about 14hrs now.. there are 33934 rows in Duncantemp and 107913558 rows in the OS_MM_AREA table - my code will be scanning the big table for matches for each row..

can anyone think of a better and quicker way of finding those rows in DUNCANTEMP which don't match any records in OS_MM_AREA given a direct match or +100/+500<TOID> match as well?

I welcome suggestions please.

(if it doesn't complete over the weekend, I will add 2 more columns to the big daddy table with +500 and +100<TOID>, index them, and run the code using a join.. but I was interested if I had fundamentally missed something first

thank you kindly

D.
SrcName
SrcName
SSC-Enthusiastic
SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)

Group: General Forum Members
Points: 122 Visits: 394
read this fantastic article carefully. This may help you
http://www.sqlservercentral.com/articles/T-SQL/62867/
duncanburtenshaw
duncanburtenshaw
SSC Rookie
SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)

Group: General Forum Members
Points: 49 Visits: 646
SrcName (9/20/2013)
read this fantastic article carefully. This may help you
http://www.sqlservercentral.com/articles/T-SQL/62867/


SrcName - thank you for the link, I have read this before and am familiar with the numbers/tally table. I'm not sure though how it helps me here, does not having a SARGABLE where clause cause a scan whatever I do?

Sorry, I haven't used the tally table enough to see its application here.

I would be very grateful if you could elaborate please.

thank you

D.
TomThomson
TomThomson
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10704 Visits: 12001
The thing to do is try to avoid the scan of the big table; give it something simple to look for.
I don't offer any guarantee, but something like this might work:
update DUNCANTEMP set DOESEXIST = 1
where TOID like '100%' or TOID like '500%
and exists (select 1 from OS_MM_AREA OMA with (NOLOCK)
where OMA.TOID = RIGHT(DUNCANTEMP.TOID,13))


That avoids applying a concatenation to OMA.TOID, so it should be able to use the index for the existence test, which avoids scanning teh big table. It may of course do a scan on DUNCANTEMP, but that's unavoidable.

Anyway, for future reference: please remember to post an actual execution plan (if possible; estimated plan otherwise) with queries about improving performance.

Tom

Erland Sommarskog
Erland Sommarskog
SSC Eights!
SSC Eights! (931 reputation)SSC Eights! (931 reputation)SSC Eights! (931 reputation)SSC Eights! (931 reputation)SSC Eights! (931 reputation)SSC Eights! (931 reputation)SSC Eights! (931 reputation)SSC Eights! (931 reputation)

Group: General Forum Members
Points: 931 Visits: 866
Before we start talking about sargable arguments, maybe we should see if there is anything to sarg for. From what Duncan says, there does not seem to be any.

We have keys that should be 1001231231231 or 1007897897897, but in fact are only 1231231231 and 7897897897. Unless there is a pattern for the beginning of these short keys, there is nothing to sarg for.

What Duncan could do is to insert all 13-character keys into a temp table, in this way he only needs to scan the big table once. Which still takes a long time, but it should not take the entire weekend.

Erland Sommarskog, SQL Server MVP, www.sommarskog.se
duncanburtenshaw
duncanburtenshaw
SSC Rookie
SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)

Group: General Forum Members
Points: 49 Visits: 646
Erland Sommarskog (9/21/2013)
Before we start talking about sargable arguments, maybe we should see if there is anything to sarg for. From what Duncan says, there does not seem to be any.

We have keys that should be 1001231231231 or 1007897897897, but in fact are only 1231231231 and 7897897897. Unless there is a pattern for the beginning of these short keys, there is nothing to sarg for.

What Duncan could do is to insert all 13-character keys into a temp table, in this way he only needs to scan the big table once. Which still takes a long time, but it should not take the entire weekend.


Thank you all for your inputs - I have failed to explain the problem correctly, it is the large table which has a mix of 13 and 16 char TOIDs, these are the ones I am looking to find by adding '100'+toid and '500'+toid to my search and seeing if any records in the smaller table match - I will add two more columns to the bigger table, add in the '100'+toid and '500'+toid to these, create an index and see if I get any matches

I think that any other option will inevitably produce a table scan on the large table for each row of the smaller table, so this is the way I am going now.

thanks anyway..

D.
TomThomson
TomThomson
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10704 Visits: 12001
duncanburtenshaw (9/23/2013)
Erland Sommarskog (9/21/2013)
Before we start talking about sargable arguments, maybe we should see if there is anything to sarg for. From what Duncan says, there does not seem to be any.

We have keys that should be 1001231231231 or 1007897897897, but in fact are only 1231231231 and 7897897897. Unless there is a pattern for the beginning of these short keys, there is nothing to sarg for.

What Duncan could do is to insert all 13-character keys into a temp table, in this way he only needs to scan the big table once. Which still takes a long time, but it should not take the entire weekend.


Thank you all for your inputs - I have failed to explain the problem correctly, it is the large table which has a mix of 13 and 16 char TOIDs, these are the ones I am looking to find by adding '100'+toid and '500'+toid to my search and seeing if any records in the smaller table match - I will add two more columns to the bigger table, add in the '100'+toid and '500'+toid to these, create an index and see if I get any matches

I think that any other option will inevitably produce a table scan on the large table for each row of the smaller table, so this is the way I am going now.

thanks anyway..

D.

I'd be interested to know how many rows there have to be in the small table before the code i used produces a scan of the large table. In theory the optimiser should only do this when it thinks that the number of index lokups would be sufficientl large for it to be advantageous touse a scan; I find it hard to believe that something as low as .000324% of the rows implies a scan rather than individual lookups when what is being looked up is an exact match of the 13 character string.

Of course it might be better if the 16 character matches were elimnated first and a new test table constructed from what wasn't a 16-character match, with the first three characters discarded (and the row eliminated if they were not one of the two interesting 3-character strings) and provided with a clustered index on the string column, because then the check for those values is very obviously a check for matches in the two indexes with no mesing about with appending prefixes - that would certainly take a lot less time than adding two new indexes to the big table and then checking all the possible matches, as eliminating the 16 character matches and truncating what is left will be a lot cheaper than creating two new indexes on the big table (about 2000 times fewer logical writes) and once you've done that you have only 1 index (the one you already have on the big table) to match against instead of 3.

Tom

duncanburtenshaw
duncanburtenshaw
SSC Rookie
SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)

Group: General Forum Members
Points: 49 Visits: 646
L' Eomot Inversé (9/23/2013)
duncanburtenshaw (9/23/2013)
Erland Sommarskog (9/21/2013)
Before we start talking about sargable arguments, maybe we should see if there is anything to sarg for. From what Duncan says, there does not seem to be any.

We have keys that should be 1001231231231 or 1007897897897, but in fact are only 1231231231 and 7897897897. Unless there is a pattern for the beginning of these short keys, there is nothing to sarg for.

What Duncan could do is to insert all 13-character keys into a temp table, in this way he only needs to scan the big table once. Which still takes a long time, but it should not take the entire weekend.


Thank you all for your inputs - I have failed to explain the problem correctly, it is the large table which has a mix of 13 and 16 char TOIDs, these are the ones I am looking to find by adding '100'+toid and '500'+toid to my search and seeing if any records in the smaller table match - I will add two more columns to the bigger table, add in the '100'+toid and '500'+toid to these, create an index and see if I get any matches

I think that any other option will inevitably produce a table scan on the large table for each row of the smaller table, so this is the way I am going now.

thanks anyway..

D.

I'd be interested to know how many rows there have to be in the small table before the code i used produces a scan of the large table. In theory the optimiser should only do this when it thinks that the number of index lokups would be sufficientl large for it to be advantageous touse a scan; I find it hard to believe that something as low as .000324% of the rows implies a scan rather than individual lookups when what is being looked up is an exact match of the 13 character string.

Of course it might be better if the 16 character matches were elimnated first and a new test table constructed from what wasn't a 16-character match, with the first three characters discarded (and the row eliminated if they were not one of the two interesting 3-character strings) and provided with a clustered index on the string column, because then the check for those values is very obviously a check for matches in the two indexes with no mesing about with appending prefixes - that would certainly take a lot less time than adding two new indexes to the big table and then checking all the possible matches, as eliminating the 16 character matches and truncating what is left will be a lot cheaper than creating two new indexes on the big table (about 2000 times fewer logical writes) and once you've done that you have only 1 index (the one you already have on the big table) to match against instead of 3.


Thank you for your comments. Perhaps I am completely missing the point. Let me reiterate:

1. I have a large table (lets call it BIGa)
2. I have a smaller table, (lets call it Smallb)
3. The BIGa table has 108,000,000 rows. Out of these, approx 27,000,000 rows have a TOID (unique number) which is 13 chars in length instead of 16 (which all of the other rows are).
4. I have a small update table (Smallb) with 370,000 odd rows. None of the rows in Smallb are 13 chars long, they are all correct length of 16 chars.
5. I am trying to match any rows from Smallb with a matching row in BIGa based on any TOIDs in BIGa which are a) 13 chars long, and where the row matches based on '100'+TOID OR '500'+toid

Based on the assumption above, surely all rows in the big table (27,000,000 rows based on len(TOID=13)) will be scanned?

This was my original question - apart from adding a couple of columns persisting the '100'+TOID and '500'+TOID /indexing them, and then looking for a match - was there a better way of seeing if there are any matches between Smallb and BIGa that the code I originally quoted.

I have, quite obviously, failed completely to explain this problem satisfactorily and for that I apologise.

I have added new columns on Biga now with '500' and '100'+TOID, and will check for any direct matches after this has completed - I just wanted to know if I was missing anything which would have saved me the time in creating these columns and would allow me to detect possible matches.

thank you for all your suggestions - next time I will attach execution plans as well.

D.
bleroy
bleroy
SSC-Enthusiastic
SSC-Enthusiastic (172 reputation)SSC-Enthusiastic (172 reputation)SSC-Enthusiastic (172 reputation)SSC-Enthusiastic (172 reputation)SSC-Enthusiastic (172 reputation)SSC-Enthusiastic (172 reputation)SSC-Enthusiastic (172 reputation)SSC-Enthusiastic (172 reputation)

Group: General Forum Members
Points: 172 Visits: 739
What about adding 1 column to the small table that would contain RIGHT(smallTable.toid,13), create an index on that and then do your join?

B
duncanburtenshaw
duncanburtenshaw
SSC Rookie
SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)SSC Rookie (49 reputation)

Group: General Forum Members
Points: 49 Visits: 646
bleroy (9/23/2013)
What about adding 1 column to the small table that would contain RIGHT(smallTable.toid,13), create an index on that and then do your join?

B


ooooh.. good idea! :-)

Exactly what I was looking for.. of course, I could just get 13 chars from the little table and see if there are any matches in the big table based on this.

can't see the wood for the trees.. I will try this, thank you for the suggestion, seems so obvious now.

D.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search