Possible Query Refinement & SSIS Package Locking Source Table?

Question

Post reply

Possible Query Refinement & SSIS Package Locking Source Table?

abelwingnut

Grasshopper

Points: 16
More actions
March 10, 2016 at 8:35 am

#399838

Hi,
I wrote an SSIS package to generate a report. The basic flow of the data flow is this:
- Declare table variable
- Fill table variable with a union of two tables. This contains about 700000 records
- Select statement joining nine tables, the table variable included
- Put results in csv
First off, the entire process takes about 40m. That seems suspect to me. Originally, the select statement included the union in the from statement and took two hours. I decided to change it up using the table variable, and it cut the time down significantly. I still feel like it can be brought down to seconds, though. None of the tables contain more than 2000000 records, and the csv only contains 15k records. The estimated execution plan tells me the DECLARE table variable and the INSERT into it take up 100% of the query cost relative to the batch.
Here it is:
DECLARE @TransGuests TABLE (
sale_no NUMERIC(17, 0)
,guest_no NUMERIC(17, 0)
)
INSERT INTO
@TransGuests
(
sale_no
,guest_no
)
SELECT
sale_no
,guest_no
FROM
tr_info
WHERE
guest_no != 0
UNION
SELECT
sale_no
,MAX(guest_no) OVER (PARTITION BY mastersale)AS guest_no
FROM
sale_hdr sh
WHERE
mastersale != 0
Clustered Index Scan on sale_hdr takes 18% of cost
Sort 19%
Clustered Index Scan on tr_info takes 16% of cost
Union takes 24%
Table Insert takes 18%
When I run this segment of code alone it takes about 9s on its own. I'm wondering...is the entire query, this DECLARE/INSERT and the SELECT portion, is it running the DECLARE/INSERT for each record?
Second, when I run the process inside the package, it locks the tables. The client programs all crashed saying they couldn't connect to the database. But when I tested the process outside of a package on the training server and the live server, no such locks took place.
What's going on here? Obviously I'm not a SQL expert. There's probably an easy solution to both issues that I'm just unaware of. Any and all help would be much obliged.
Thank you.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247225 More actions · Answer 1

Hello and welcome to the forum.

Presumably you are doing this all in a dataflow task?

I see no reason whatsoever for the table variable. I suggest that you change the data flow source to include the query only:

SELECT sale_no

, guest_no

FROM tr_info

WHERE guest_no != 0

UNION

SELECT sale_no

, MAX(guest_no) OVER (PARTITION BY mastersale) AS guest_no

FROM sale_hdr sh

WHERE mastersale != 0

Also, please remember that UNION returns DISTINCT rows. If you want all rows (and faster processing) use UNION ALL.

It may be worth experimenting with two data flows, one after the other. One for query one and one (which appends to the CSV file) for query 2 (assuming you don't need the DISTINCT, of course). Just in case the performance is better.

abelwingnut Grasshopper Points: 16 More actions · Answer 2

Hi,

Thanks for the advice and the welcome.

I was confusing in my post. That DECLARE/INSERT statement I posted constitutes a portion of the whole query. Following that snippet of code there's a SELECT statement using the table variable in its FROM section, along with seven other tables. I posted the snippet because the Estimated Execution Plan says that's where 100% of the processing takes place relative to the entire query/batch.

Phil Parkin SSC Guru Points: 247225 More actions · Answer 3

abelwingnut (3/10/2016)
Hi,
Thanks for the advice and the welcome.
I was confusing in my post. That DECLARE/INSERT statement I posted constitutes a portion of the whole query. Following that snippet of code there's a SELECT statement using the table variable in its FROM section, along with seven other tables. I posted the snippet because the Estimated Execution Plan says that's where 100% of the processing takes place relative to the entire query/batch.

OK. Then I would suggest that you wrap the whole thing up in a proc and use that as your data flow source. Embedding complex SQL code in a package is not good practice.

Table variables are usually not a good idea in cases where more than a few rows are involved.Try modifying your code to use temp tables instead.

Optimise your query in SSMS first, and then move on to the SSIS bit.