Indexes, Query plans and optimization

Question

Post reply

Indexes, Query plans and optimization

ianderson 63159

Valued Member

Points: 55
More actions
September 14, 2012 at 9:22 am

#261670

Greetings! First time posting here. Great forum and site though, kudos!
here is my scenario:
I have one table for each year 1998-2010 containing arbitrary data (13 tables). Naming convention is "tYYYY15" where YYYY is the year. Each table has around 7 million records. There is a clustered index on year (varchar(4)) and rownumber (INT) (left to right).
I have one table for each year 1998-2010 also containing arbitrary data that relates to the first group using a rownumber (13 tables). Naming convention is "tYYYY15a". Each table has around 50 million records. There is a clustered index on year, rownumber, and one more column called theIndex (left to right).
I am taking a subset of data from each of the "15" tables and stacking up into one table, t15_all. So this single table has all years but not all rowsfrom each year.
Now, I need to stack the "15a" rows into one table (t15a_all) but only the rows whose year and rownumber correspond to a year and rownumber contained in the t15_all table. Fairly straightforward.
I have done some testing and believe that the following set of queries should be optimal for inserting into the t15a_all table:
INSERT INTO t15a_All SELECT * FROM t199815a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '1998')
INSERT INTO t15a_All SELECT * FROM t199915a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '1999')
INSERT INTO t15a_All SELECT * FROM t200015a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '2000')
INSERT INTO t15a_All SELECT * FROM t200115a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '2001')
INSERT INTO t15a_All SELECT * FROM t200215a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '2002')
INSERT INTO t15a_All SELECT * FROM t200315a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '2003')
INSERT INTO t15a_All SELECT * FROM t200415a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '2004')
INSERT INTO t15a_All SELECT * FROM t200515a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '2005')
INSERT INTO t15a_All SELECT * FROM t200615a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '2006')
INSERT INTO t15a_All SELECT * FROM t200715a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '2007')
INSERT INTO t15a_All SELECT * FROM t200815a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '2008')
INSERT INTO t15a_All SELECT * FROM t200915a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '2009')
INSERT INTO t15a_All SELECT * FROM t201015a WHERE file_year+rownum IN (SELECT file_year+rownum FROM t15_all WHERE file_year = '2010')
So, what happens? The queries for 1998-2002 run very well. In testing where I am only dealing with 200,000 rows at a time (instead of 7 million), it takes less than 2 minutes per insert. Get to 2003 and BANG, the server starts "spinning" - just sitting there with nothing happening apparently - for 4 hours plus before I kill the query. I try 2003 on its own a few times and always kill it at about 20 minutes which would be 10x longer than i think it should take.
I run 2004 to see what happens. Less than 2 minutes.
I run 2005-2020 in a batch. 2005-2009 take less than 2 minutes a piece. 2010... I got to work this morning and the query was not finished after 16 hours. I killed it...
HELP! Why is this happening?
I generate an estimated execution plan of the above code for years 2005-2010. The plan for each year 2005-2009 each have a cost relative to the batch of about 20%. They all do a clustered index seek (Cost: 0%) on the t15_all table (this is good, gets the appropriate subset of rownumbers based on the year). The biggest cost to each query is getting the "15a" data at 99%.
2010 has a relative cost of 0%. That doesn't seem right! It also has a clustered index seek but the cost is 5% - different from all the rest - and getting the 15a data is only a 10% cost. I don't know if I'm getting to terse here or am even on the right track.
Does anybody have any suggestions, please? I would be very greatful!

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Would you please post the DDL (CREATE TABLE) statements and include the indexes defined.

ianderson 63159 Valued Member Points: 55 More actions · Answer 2

for the new tables:

CREATE TABLE t15_all (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t15_all (file_year, rownum)

CREATE TABLE t15a_all (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t15a_all (file_year, rownum, theIndex)

for the base tables:

CREATE TABLE t199815 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t199815 (file_year, rownum)

CREATE TABLE t199815a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t199815a (file_year, rownum, theIndex)

CREATE TABLE t199915 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t199915 (file_year, rownum)

CREATE TABLE t199915a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t199915a (file_year, rownum, theIndex)

CREATE TABLE t200015 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200015 (file_year, rownum)

CREATE TABLE t200015a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200015a (file_year, rownum, theIndex)

CREATE TABLE t200115 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200115 (file_year, rownum)

CREATE TABLE t200115a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200115a (file_year, rownum, theIndex)

CREATE TABLE t200215 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200215 (file_year, rownum)

CREATE TABLE t200215a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200215a (file_year, rownum, theIndex)

CREATE TABLE t200315 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200315 (file_year, rownum)

CREATE TABLE t200315a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200315a (file_year, rownum, theIndex)

CREATE TABLE t200415 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200415 (file_year, rownum)

CREATE TABLE t200415a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200415a (file_year, rownum, theIndex)

CREATE TABLE t200515 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200515 (file_year, rownum)

CREATE TABLE t200515a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200515a (file_year, rownum, theIndex)

CREATE TABLE t200615 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200615 (file_year, rownum)

CREATE TABLE t200615a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200615a (file_year, rownum, theIndex)

CREATE TABLE t200715 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200715 (file_year, rownum)

CREATE TABLE t200715a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200715a (file_year, rownum, theIndex)

CREATE TABLE t200815 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200815 (file_year, rownum)

CREATE TABLE t200815a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200815a (file_year, rownum, theIndex)

CREATE TABLE t200915 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200915 (file_year, rownum)

CREATE TABLE t200915a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t200915a (file_year, rownum, theIndex)

CREATE TABLE t201015 (file_year varchar(4), rownum INT IDENTITY(1,1), somedata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t201015 (file_year, rownum)

CREATE TABLE t201015a (file_year varchar(4), rownum INT IDENTITY(1,1), theIndex INT, someotherdata varchar(55))

CREATE CLUSTERED INDEX idx_clstr ON t201015a (file_year, rownum, theIndex)

please let me know if this isn't adequate!

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

I think I would go with something more like this:

INSERT INTO t15a_All

SELECT

t.*

FROM

t199815a t

inner join t15_All ta

on (t.file_year = ta.file_year and

t.rownum = ta.rownum)

WHERE

ta.file_year = '1998';

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 4

Gut feel, you've run into this: http://sqlinthewild.co.za/index.php/2011/03/22/statistics-row-estimations-and-the-ascending-date-column/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ianderson 63159 Valued Member Points: 55 More actions · Answer 5

Thanks Lynn! In my first test, that is running great! Your suggestion seems to perform about the same for processing time but didn't experience the hiccups...

I had tried something similar but it took twice as long to execute as what I am doing in the original post of this thread. Precisely, i tried:

INSERT INTO t15a_All

SELECT

t.*

FROM

t199815a t

inner join (SELECT * t15_All FROM WHERE orig_file_year='1998') AS ta

on t.file_year+t.rownum = ta.file_year+ta.rownum

Can you help me understand why your suggestion differs in performance from this one? The differences I see are that you put 1998 in the WHERE clause instead of a subquery and that our ON clauses for the join differ.

I would love to understand better what mistake I was making and WHY it was a mistake! Thanks again!

ianderson 63159 Valued Member Points: 55 More actions · Answer 6

Thanks Gail, too. You may be right and I think I've run into that before. I am such a hack that I wasn't sure how to update stastics tho. Just seeing the blog say "update statistics" was a pretty good clue!

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

ianderson 63159 (9/14/2012)
Thanks Lynn! In my first test, that is running great! Your suggestion seems to perform about the same for processing time but didn't experience the hiccups...
I had tried something similar but it took twice as long to execute as what I am doing in the original post of this thread. Precisely, i tried:
INSERT INTO t15a_All
SELECT
t.*
FROM
t199815a t
inner join (SELECT * t15_All FROM WHERE orig_file_year='1998') AS ta
on t.file_year+t.rownum = ta.file_year+ta.rownum
Can you help me understand why your suggestion differs in performance from this one? The differences I see are that you put 1998 in the WHERE clause instead of a subquery and that our ON clauses for the join differ.
I would love to understand better what mistake I was making and WHY it was a mistake! Thanks again!

Simple, why are you concatenating the two columns when the tables are indexed on those columns? By concatentating them you force SQL Server to use a table (or clustered index) scan.

ianderson 63159 Valued Member Points: 55 More actions · Answer 8

Just a faulty manner of thinking about the index on my part. Thank you for helping me clear that up!

Simple, why are you concatenating the two columns when the tables are indexed on those columns? By concatentating them you force SQL Server to use a table (or clustered index) scan.