Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

108,000,000 row table - check for matches Expand / Collapse
Author
Message
Posted Friday, September 20, 2013 10:00 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 6:29 AM
Points: 30, Visits: 463
Good afternoon,

Long time reader - first time poster here!

I am currently working on a 108m row table where some of the ids (TOID column) have inadvertently been shortened - they should all be 16chars in length, i.e. 0001000000076579
0001000000104566
0001000000159225
0001000000250870
0001000000263722
0001000000303305
0001000000303309

but some are 13 in length. I think that some are missing a 100 at the beginning and some are missing 500. I have an update table (where I have to find matches between this table and the big daddy table based on this id (called TOID))

I need to find out which rows in the update table don't match rows in the big daddy table - I can join on the matches, no problem, but when it comes to searching for those matches which are a concatenation of 100/500<TOID> this makes the query run very very slowly.

my code is quite simply:


UPDATE DUNCANTEMP
SET
DOESEXIST = 1
WHERE
EXISTS (SELECT TOID
FROM
[OS_MasterMap].dbo.OS_MM_AREA OMA WITH (NOLOCK)
WHERE
('500' + OMA.Toid = DUNCANTEMP.TOID
OR
'100' + OMA.Toid = DUNCANTEMP.TOID)
AND
len(OMA.Toid) = 13)

Where DUNCANTEMP consists of:

CREATE TABLE dbo.DUNCANTEMP(
TOID VARCHAR(50) NULL,
DOESEXIST INT NULL
) ON [PRIMARY]

The tables area static and in a dev environment.

And [OS_MasterMap].dbo.OS_MM_AREA has loads of columns, but TOID is the only one we need to look at:

CREATE TABLE dbo.OS_MM_AREA(
OBJECTID INT IDENTITY,
SHAPE GEOMETRY NOT NULL,
Toid NVARCHAR(20) NOT NULL,
Version INT NULL,
VerDate DATETIME NULL,
FeatCode INT NULL,
Theme NVARCHAR(80) NULL,
CalcArea NUMERIC(38, 8) NULL,
Change NVARCHAR(80) NULL,
DescGroup NVARCHAR(150) NULL,
DescTerm NVARCHAR(150) NULL,
Make NVARCHAR(20) NULL,
PhysLevel INT NULL,
PhysPres NVARCHAR(20) NULL,
Broken SMALLINT NULL,
LoadDate DATETIME NULL,
CONSTRAINT PK__OS_MM_AR__F4B70D851D5924B0 PRIMARY KEY (OBJECTID),
CONSTRAINT UQ__OS_MM_AR__FF378992EC7F35C5 UNIQUE (Toid)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

There is a unique constraint on OS_MM_AREA (TOID) and a non-clustered index.

there is a clustered index on TOID on the DUNCANTEMP table (but not unique as there may be some duplicate TOIDs based on multiple update rows).

My question is: my code has been running for about 14hrs now.. there are 33934 rows in Duncantemp and 107913558 rows in the OS_MM_AREA table - my code will be scanning the big table for matches for each row..

can anyone think of a better and quicker way of finding those rows in DUNCANTEMP which don't match any records in OS_MM_AREA given a direct match or +100/+500<TOID> match as well?

I welcome suggestions please.

(if it doesn't complete over the weekend, I will add 2 more columns to the big daddy table with +500 and +100<TOID>, index them, and run the code using a join.. but I was interested if I had fundamentally missed something first

thank you kindly

D.
Post #1496946
Posted Friday, September 20, 2013 10:12 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Sunday, August 10, 2014 2:47 PM
Points: 94, Visits: 330
read this fantastic article carefully. This may help you
http://www.sqlservercentral.com/articles/T-SQL/62867/
Post #1496951
Posted Saturday, September 21, 2013 1:46 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 6:29 AM
Points: 30, Visits: 463
SrcName (9/20/2013)
read this fantastic article carefully. This may help you
http://www.sqlservercentral.com/articles/T-SQL/62867/


SrcName - thank you for the link, I have read this before and am familiar with the numbers/tally table. I'm not sure though how it helps me here, does not having a SARGABLE where clause cause a scan whatever I do?

Sorry, I haven't used the tally table enough to see its application here.

I would be very grateful if you could elaborate please.

thank you

D.
Post #1497120
Posted Saturday, September 21, 2013 10:08 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 8:31 AM
Points: 8,707, Visits: 9,255
The thing to do is try to avoid the scan of the big table; give it something simple to look for.
I don't offer any guarantee, but something like this might work:
update DUNCANTEMP set DOESEXIST = 1
where TOID like '100%' or TOID like '500%
and exists (select 1 from OS_MM_AREA OMA with (NOLOCK)
where OMA.TOID = RIGHT(DUNCANTEMP.TOID,13))

That avoids applying a concatenation to OMA.TOID, so it should be able to use the index for the existence test, which avoids scanning teh big table. It may of course do a scan on DUNCANTEMP, but that's unavoidable.

Anyway, for future reference: please remember to post an actual execution plan (if possible; estimated plan otherwise) with queries about improving performance.


Tom
Post #1497144
Posted Saturday, September 21, 2013 4:20 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, August 21, 2014 3:04 PM
Points: 801, Visits: 715
Before we start talking about sargable arguments, maybe we should see if there is anything to sarg for. From what Duncan says, there does not seem to be any.

We have keys that should be 1001231231231 or 1007897897897, but in fact are only 1231231231 and 7897897897. Unless there is a pattern for the beginning of these short keys, there is nothing to sarg for.

What Duncan could do is to insert all 13-character keys into a temp table, in this way he only needs to scan the big table once. Which still takes a long time, but it should not take the entire weekend.


Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Post #1497170
Posted Monday, September 23, 2013 12:59 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 6:29 AM
Points: 30, Visits: 463
Erland Sommarskog (9/21/2013)
Before we start talking about sargable arguments, maybe we should see if there is anything to sarg for. From what Duncan says, there does not seem to be any.

We have keys that should be 1001231231231 or 1007897897897, but in fact are only 1231231231 and 7897897897. Unless there is a pattern for the beginning of these short keys, there is nothing to sarg for.

What Duncan could do is to insert all 13-character keys into a temp table, in this way he only needs to scan the big table once. Which still takes a long time, but it should not take the entire weekend.


Thank you all for your inputs - I have failed to explain the problem correctly, it is the large table which has a mix of 13 and 16 char TOIDs, these are the ones I am looking to find by adding '100'+toid and '500'+toid to my search and seeing if any records in the smaller table match - I will add two more columns to the bigger table, add in the '100'+toid and '500'+toid to these, create an index and see if I get any matches

I think that any other option will inevitably produce a table scan on the large table for each row of the smaller table, so this is the way I am going now.

thanks anyway..

D.
Post #1497279
Posted Monday, September 23, 2013 6:36 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 8:31 AM
Points: 8,707, Visits: 9,255
duncanburtenshaw (9/23/2013)
Erland Sommarskog (9/21/2013)
Before we start talking about sargable arguments, maybe we should see if there is anything to sarg for. From what Duncan says, there does not seem to be any.

We have keys that should be 1001231231231 or 1007897897897, but in fact are only 1231231231 and 7897897897. Unless there is a pattern for the beginning of these short keys, there is nothing to sarg for.

What Duncan could do is to insert all 13-character keys into a temp table, in this way he only needs to scan the big table once. Which still takes a long time, but it should not take the entire weekend.


Thank you all for your inputs - I have failed to explain the problem correctly, it is the large table which has a mix of 13 and 16 char TOIDs, these are the ones I am looking to find by adding '100'+toid and '500'+toid to my search and seeing if any records in the smaller table match - I will add two more columns to the bigger table, add in the '100'+toid and '500'+toid to these, create an index and see if I get any matches

I think that any other option will inevitably produce a table scan on the large table for each row of the smaller table, so this is the way I am going now.

thanks anyway..

D.

I'd be interested to know how many rows there have to be in the small table before the code i used produces a scan of the large table. In theory the optimiser should only do this when it thinks that the number of index lokups would be sufficientl large for it to be advantageous touse a scan; I find it hard to believe that something as low as .000324% of the rows implies a scan rather than individual lookups when what is being looked up is an exact match of the 13 character string.

Of course it might be better if the 16 character matches were elimnated first and a new test table constructed from what wasn't a 16-character match, with the first three characters discarded (and the row eliminated if they were not one of the two interesting 3-character strings) and provided with a clustered index on the string column, because then the check for those values is very obviously a check for matches in the two indexes with no mesing about with appending prefixes - that would certainly take a lot less time than adding two new indexes to the big table and then checking all the possible matches, as eliminating the 16 character matches and truncating what is left will be a lot cheaper than creating two new indexes on the big table (about 2000 times fewer logical writes) and once you've done that you have only 1 index (the one you already have on the big table) to match against instead of 3.


Tom
Post #1497353
Posted Monday, September 23, 2013 6:52 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 6:29 AM
Points: 30, Visits: 463
L' Eomot Inversé (9/23/2013)
duncanburtenshaw (9/23/2013)
Erland Sommarskog (9/21/2013)
Before we start talking about sargable arguments, maybe we should see if there is anything to sarg for. From what Duncan says, there does not seem to be any.

We have keys that should be 1001231231231 or 1007897897897, but in fact are only 1231231231 and 7897897897. Unless there is a pattern for the beginning of these short keys, there is nothing to sarg for.

What Duncan could do is to insert all 13-character keys into a temp table, in this way he only needs to scan the big table once. Which still takes a long time, but it should not take the entire weekend.


Thank you all for your inputs - I have failed to explain the problem correctly, it is the large table which has a mix of 13 and 16 char TOIDs, these are the ones I am looking to find by adding '100'+toid and '500'+toid to my search and seeing if any records in the smaller table match - I will add two more columns to the bigger table, add in the '100'+toid and '500'+toid to these, create an index and see if I get any matches

I think that any other option will inevitably produce a table scan on the large table for each row of the smaller table, so this is the way I am going now.

thanks anyway..

D.

I'd be interested to know how many rows there have to be in the small table before the code i used produces a scan of the large table. In theory the optimiser should only do this when it thinks that the number of index lokups would be sufficientl large for it to be advantageous touse a scan; I find it hard to believe that something as low as .000324% of the rows implies a scan rather than individual lookups when what is being looked up is an exact match of the 13 character string.

Of course it might be better if the 16 character matches were elimnated first and a new test table constructed from what wasn't a 16-character match, with the first three characters discarded (and the row eliminated if they were not one of the two interesting 3-character strings) and provided with a clustered index on the string column, because then the check for those values is very obviously a check for matches in the two indexes with no mesing about with appending prefixes - that would certainly take a lot less time than adding two new indexes to the big table and then checking all the possible matches, as eliminating the 16 character matches and truncating what is left will be a lot cheaper than creating two new indexes on the big table (about 2000 times fewer logical writes) and once you've done that you have only 1 index (the one you already have on the big table) to match against instead of 3.


Thank you for your comments. Perhaps I am completely missing the point. Let me reiterate:

1. I have a large table (lets call it BIGa)
2. I have a smaller table, (lets call it Smallb)
3. The BIGa table has 108,000,000 rows. Out of these, approx 27,000,000 rows have a TOID (unique number) which is 13 chars in length instead of 16 (which all of the other rows are).
4. I have a small update table (Smallb) with 370,000 odd rows. None of the rows in Smallb are 13 chars long, they are all correct length of 16 chars.
5. I am trying to match any rows from Smallb with a matching row in BIGa based on any TOIDs in BIGa which are a) 13 chars long, and where the row matches based on '100'+TOID OR '500'+toid

Based on the assumption above, surely all rows in the big table (27,000,000 rows based on len(TOID=13)) will be scanned?

This was my original question - apart from adding a couple of columns persisting the '100'+TOID and '500'+TOID /indexing them, and then looking for a match - was there a better way of seeing if there are any matches between Smallb and BIGa that the code I originally quoted.

I have, quite obviously, failed completely to explain this problem satisfactorily and for that I apologise.

I have added new columns on Biga now with '500' and '100'+TOID, and will check for any direct matches after this has completed - I just wanted to know if I was missing anything which would have saved me the time in creating these columns and would allow me to detect possible matches.

thank you for all your suggestions - next time I will attach execution plans as well.

D.
Post #1497362
Posted Monday, September 23, 2013 8:56 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 18, 2014 4:12 AM
Points: 165, Visits: 681
What about adding 1 column to the small table that would contain RIGHT(smallTable.toid,13), create an index on that and then do your join?

B
Post #1497430
Posted Monday, September 23, 2013 9:06 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 6:29 AM
Points: 30, Visits: 463
bleroy (9/23/2013)
What about adding 1 column to the small table that would contain RIGHT(smallTable.toid,13), create an index on that and then do your join?

B


ooooh.. good idea!

Exactly what I was looking for.. of course, I could just get 13 chars from the little table and see if there are any matches in the big table based on this.

can't see the wood for the trees.. I will try this, thank you for the suggestion, seems so obvious now.

D.
Post #1497435
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse