Possible Query Refinement & SSIS Package Locking Source Table?

  • Hi,

    I wrote an SSIS package to generate a report. The basic flow of the data flow is this:

    - Declare table variable

    - Fill table variable with a union of two tables. This contains about 700000 records

    - Select statement joining nine tables, the table variable included

    - Put results in csv

    First off, the entire process takes about 40m. That seems suspect to me. Originally, the select statement included the union in the from statement and took two hours. I decided to change it up using the table variable, and it cut the time down significantly. I still feel like it can be brought down to seconds, though. None of the tables contain more than 2000000 records, and the csv only contains 15k records. The estimated execution plan tells me the DECLARE table variable and the INSERT into it take up 100% of the query cost relative to the batch.

    Here it is:

    DECLARE @TransGuests TABLE (

    sale_no NUMERIC(17, 0)

    ,guest_no NUMERIC(17, 0)

    )

    INSERT INTO

    @TransGuests

    (

    sale_no

    ,guest_no

    )

    SELECT

    sale_no

    ,guest_no

    FROM

    tr_info

    WHERE

    guest_no != 0

    UNION

    SELECT

    sale_no

    ,MAX(guest_no) OVER (PARTITION BY mastersale)AS guest_no

    FROM

    sale_hdr sh

    WHERE

    mastersale != 0

    Clustered Index Scan on sale_hdr takes 18% of cost

    Sort 19%

    Clustered Index Scan on tr_info takes 16% of cost

    Union takes 24%

    Table Insert takes 18%

    When I run this segment of code alone it takes about 9s on its own. I'm wondering...is the entire query, this DECLARE/INSERT and the SELECT portion, is it running the DECLARE/INSERT for each record?

    Second, when I run the process inside the package, it locks the tables. The client programs all crashed saying they couldn't connect to the database. But when I tested the process outside of a package on the training server and the live server, no such locks took place.

    What's going on here? Obviously I'm not a SQL expert. There's probably an easy solution to both issues that I'm just unaware of. Any and all help would be much obliged.

    Thank you.

  • Hello and welcome to the forum.

    Presumably you are doing this all in a dataflow task?

    I see no reason whatsoever for the table variable. I suggest that you change the data flow source to include the query only:

    SELECT sale_no

    , guest_no

    FROM tr_info

    WHERE guest_no != 0

    UNION

    SELECT sale_no

    , MAX(guest_no) OVER (PARTITION BY mastersale) AS guest_no

    FROM sale_hdr sh

    WHERE mastersale != 0

    Also, please remember that UNION returns DISTINCT rows. If you want all rows (and faster processing) use UNION ALL.

    It may be worth experimenting with two data flows, one after the other. One for query one and one (which appends to the CSV file) for query 2 (assuming you don't need the DISTINCT, of course). Just in case the performance is better.


  • Hi,

    Thanks for the advice and the welcome.

    I was confusing in my post. That DECLARE/INSERT statement I posted constitutes a portion of the whole query. Following that snippet of code there's a SELECT statement using the table variable in its FROM section, along with seven other tables. I posted the snippet because the Estimated Execution Plan says that's where 100% of the processing takes place relative to the entire query/batch.

  • abelwingnut (3/10/2016)


    Hi,

    Thanks for the advice and the welcome.

    I was confusing in my post. That DECLARE/INSERT statement I posted constitutes a portion of the whole query. Following that snippet of code there's a SELECT statement using the table variable in its FROM section, along with seven other tables. I posted the snippet because the Estimated Execution Plan says that's where 100% of the processing takes place relative to the entire query/batch.

    OK. Then I would suggest that you wrap the whole thing up in a proc and use that as your data flow source. Embedding complex SQL code in a package is not good practice.

    Table variables are usually not a good idea in cases where more than a few rows are involved.Try modifying your code to use temp tables instead.

    Optimise your query in SSMS first, and then move on to the SSIS bit.


Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply