Performance tuning of query for PDW

  • Hi All,

    I have to tune the below query so that it gets executed in the most minimal amount of time even when there is billions of records flowing in:

    -- please note distribution on all the tables is on id column

    create table #temptable1

    (id int, name varchar(20))

    with (location = user_db);

    -- INSERT1

    insert into #temptable1

    select

    m.id, m.name (blah blahh)

    from tableA m

    inner join tableB n on m.id = n.id

    -- INSERT2: insert and join into the same table

    insert into tableC

    select

    x.id, y.name

    from #temptable1 x

    left join tableC y on x.id = y.id

    INSERT1:

    I want to understand that in case of INSERT1, we are joining two different tables tableA and tableB and are populating a third table #temptable so internally first the engine must be making a copy of both tableA and tableB and then joining the two tables on the basis of the predicate clause and then starts inserting. is this correct?

    INSERT2:

    So in INSERT2, tableC and #temptable is joined on the basis of predicate clause and then inserting into tableC how will this hapen? Will there be locks affecting the performance??

    I have another method to fix this issue, but I would like to learn if this methodology of joining and inserting in the same table is expensive?

    Regards,

    A newbie πŸ˜€

  • sql1512 (9/5/2015)


    Hi All,

    I have to tune the below query so that it gets executed in the most minimal amount of time even when there is billions of records flowing in:

    There's no certain way to guarantee any particular level of performance with billions of rows without massive amounts of planning around the database design and the system architecture, especially since, from the examples, we're talking about extremely simplistic queries and they don't seem to have a WHERE clause, meaning, no possibility for index use. These will always be scans. That makes them completely dependent on the architecture of your disks.

    -- please note distribution on all the tables is on id column

    create table #temptable1

    (id int, name varchar(20))

    with (location = user_db);

    -- INSERT1

    insert into #temptable1

    select

    m.id, m.name (blah blahh)

    from tableA m

    inner join tableB n on m.id = n.id

    -- INSERT2: insert and join into the same table

    insert into tableC

    select

    x.id, y.name

    from #temptable1 x

    left join tableC y on x.id = y.id

    INSERT1:

    I want to understand that in case of INSERT1, we are joining two different tables tableA and tableB and are populating a third table #temptable so internally first the engine must be making a copy of both tableA and tableB and then joining the two tables on the basis of the predicate clause and then starts inserting. is this correct?

    No. It won't make a copy of tableA and tableB. It reads from them to create the combined data set. How it chooses to combine them depends on the volume of data and the indexes available. It will be one of three joins, a loops join, a hash join or a merge join. For large data sets the most efficient of these is a merge join, but, it requires that the data be in order, so you'd have to have a good index on each table that will support the merge. Otherwise, for large data sets, it's going to use a hash join. It'll build a hash table, populate it with values from one table and then find the matching values from the second. The last choice for large data sets is the loops join where it effectively has a cursor that looks across the sets. But, with the exception of the hash table in the join, at no point is it duplicating the data and then inserting it. It's just reading the data except when it builds a hash table, but even that is not a straight up duplication and it's only against one data set.

    INSERT2:

    So in INSERT2, tableC and #temptable is joined on the basis of predicate clause and then inserting into tableC how will this hapen? Will there be locks affecting the performance??

    Absolutely there will be locks. You're inserting data. It has to take out locks. The join processes are again going to be based on the volume of data. For small data sets, the most efficient is the loops join and for larger data sets, either the hash or merge, depending on indexing.

    I have another method to fix this issue, but I would like to learn if this methodology of joining and inserting in the same table is expensive?

    Regards,

    A newbie πŸ˜€

    I don't know what "issue" you're referring to. You've shown very simplistic queries that are hitting entire data sets. Your tuning opportunities here are limited to what you can do with hardware. Also, why are you bothering with loading everything into a temporary table and then using that load the third table? Why not just use the query against the first two tables to load the third? It's going to create just as much locking and blocking across all three tables but it eliminates the need for maintaining the temporary table, its data and its statistics.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (9/6/2015)


    You've shown very simplistic queries that are hitting entire data sets. Your tuning opportunities here are limited to what you can do with hardware.

    +1000... unless maybe it's a "go fetch me a yard of chowline, bucket o' steam, and a left handed crescent wrench" question for a newbie like what used to happen on the boats.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/6/2015)


    Grant Fritchey (9/6/2015)


    You've shown very simplistic queries that are hitting entire data sets. Your tuning opportunities here are limited to what you can do with hardware.

    +1000... unless maybe it's a "go fetch me a yard of chowline, bucket o' steam, and a left handed crescent wrench" question for a newbie like what used to happen on the boats.

    We need a BT punch. Go get one.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (9/6/2015)


    Jeff Moden (9/6/2015)


    Grant Fritchey (9/6/2015)


    You've shown very simplistic queries that are hitting entire data sets. Your tuning opportunities here are limited to what you can do with hardware.

    +1000... unless maybe it's a "go fetch me a yard of chowline, bucket o' steam, and a left handed crescent wrench" question for a newbie like what used to happen on the boats.

    We need a BT punch. Go get one.

    πŸ˜€

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff/Grant,

    I wonder if you guys noticed that the question is not about SQL Server, but rather APS? If you did not would your answers be any different?

    Anthony

  • Anthony Perkins (10/7/2015)


    Jeff/Grant,

    I wonder if you guys noticed that the question is not about SQL Server, but rather APS? If you did not would your answers be any different?

    Anthony

    Not really. The underlying technology is still SQL Server. Lots, but not all, the behavior is basically the same. Even in PDW, SELECT * from billions of rows is going to be about hardware configuration and design. There's not an index that makes selecting billions of rows faster. PDW offers a lot more hardware configuration options though, so... there's that. In fact, that part is where my skill set falls off a cliff.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Jeff Moden (9/6/2015)


    Grant Fritchey (9/6/2015)


    Jeff Moden (9/6/2015)


    Grant Fritchey (9/6/2015)


    You've shown very simplistic queries that are hitting entire data sets. Your tuning opportunities here are limited to what you can do with hardware.

    +1000... unless maybe it's a "go fetch me a yard of chowline, bucket o' steam, and a left handed crescent wrench" question for a newbie like what used to happen on the boats.

    We need a BT punch. Go get one.

    πŸ˜€

    Does that mean I don't actually need to change the air in my tires?

  • Jason A. Long (10/7/2015)


    Jeff Moden (9/6/2015)


    Grant Fritchey (9/6/2015)


    Jeff Moden (9/6/2015)


    Grant Fritchey (9/6/2015)


    You've shown very simplistic queries that are hitting entire data sets. Your tuning opportunities here are limited to what you can do with hardware.

    +1000... unless maybe it's a "go fetch me a yard of chowline, bucket o' steam, and a left handed crescent wrench" question for a newbie like what used to happen on the boats.

    We need a BT punch. Go get one.

    πŸ˜€

    Does that mean I don't actually need to change the air in my tires?

    No. You still need to do that. Use a Jack of all Trades to lift the wheel, first. And make sure that you don't break the knutten-valve by accidental use of a Klein-Schmidt. πŸ˜‰ When you're done, put the sawdust pump back where you got it, and don't break the greekurn in the process. If you can't find a Klein-Schmidt, a hammerfor will do. Don't forget the relative-bearing grease on the gyro.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/7/2015)


    Jason A. Long (10/7/2015)


    Jeff Moden (9/6/2015)


    Grant Fritchey (9/6/2015)


    Jeff Moden (9/6/2015)


    Grant Fritchey (9/6/2015)


    You've shown very simplistic queries that are hitting entire data sets. Your tuning opportunities here are limited to what you can do with hardware.

    +1000... unless maybe it's a "go fetch me a yard of chowline, bucket o' steam, and a left handed crescent wrench" question for a newbie like what used to happen on the boats.

    We need a BT punch. Go get one.

    πŸ˜€

    Does that mean I don't actually need to change the air in my tires?

    No. You still need to do that. Use a Jack of all Trades to lift the wheel, first. And make sure that you don't break the knutten-valve by accidental use of a Klein-Schmidt. πŸ˜‰ When you're done, put the sawdust pump back where you got it, and don't break the greekurn in the process. If you can't find a Klein-Schmidt, a hammerfor will do. Don't forget the relative-bearing grease on the gyro.

    Hey! It's all ball bearings nowadays. Now you prepare that Fetzer valve with some 3-in-1 oil and some gauze pads. And I'm gonna need 'bout ten quarts of anti-freeze, preferably Prestone. No, no make that Quaker State.

  • Jeff Moden (10/7/2015)


    Jason A. Long (10/7/2015)


    Jeff Moden (9/6/2015)


    Grant Fritchey (9/6/2015)


    Jeff Moden (9/6/2015)


    Grant Fritchey (9/6/2015)


    You've shown very simplistic queries that are hitting entire data sets. Your tuning opportunities here are limited to what you can do with hardware.

    +1000... unless maybe it's a "go fetch me a yard of chowline, bucket o' steam, and a left handed crescent wrench" question for a newbie like what used to happen on the boats.

    We need a BT punch. Go get one.

    πŸ˜€

    Does that mean I don't actually need to change the air in my tires?

    No. You still need to do that. Use a Jack of all Trades to lift the wheel, first. And make sure that you don't break the knutten-valve by accidental use of a Klein-Schmidt. πŸ˜‰ When you're done, put the sawdust pump back where you got it, and don't break the greekurn in the process. If you can't find a Klein-Schmidt, a hammerfor will do. Don't forget the relative-bearing grease on the gyro.

    Don't forget the Portuguese hand-pump.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (10/8/2015)


    Jeff Moden (10/7/2015)


    Jason A. Long (10/7/2015)


    Jeff Moden (9/6/2015)


    Grant Fritchey (9/6/2015)


    Jeff Moden (9/6/2015)


    Grant Fritchey (9/6/2015)


    You've shown very simplistic queries that are hitting entire data sets. Your tuning opportunities here are limited to what you can do with hardware.

    +1000... unless maybe it's a "go fetch me a yard of chowline, bucket o' steam, and a left handed crescent wrench" question for a newbie like what used to happen on the boats.

    We need a BT punch. Go get one.

    πŸ˜€

    Does that mean I don't actually need to change the air in my tires?

    No. You still need to do that. Use a Jack of all Trades to lift the wheel, first. And make sure that you don't break the knutten-valve by accidental use of a Klein-Schmidt. πŸ˜‰ When you're done, put the sawdust pump back where you got it, and don't break the greekurn in the process. If you can't find a Klein-Schmidt, a hammerfor will do. Don't forget the relative-bearing grease on the gyro.

    Don't forget the Portuguese hand-pump.

    I used to work in a print shop that actually did have long weights and short weights. That led to fun with new people arguing that they'd heard that one before until they were shown that we really did use long weights.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    β€”Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply