Inefficient Join involving linked server

  • One of my clients is running a select statement as part of a report. The problem is that is takes 5 minutes to run despite only returning 500 rows.

    Investigating I found that the view (vFP_Customer_Business_Types) that forms one of the "tables" in the select is itself a select from a linked server, and this select has to pull all rows from three remote tables (100000 rows). The problem is that sometimes the select on the remote server is shown by SQL Profiler to have run 15 times (returning the same dataset), whereas it ought to run only once.

    Why is it running 15 times, and is there a way to force it to run only once?

    One workaround is to do a select from the remote server into a temp table, and join on that. But my client says his reporting software cannot do this and needs a single query.

    Thanks

    SELECT

    Customer.company_name, Credit_Note.tax_point_date,

    Credit_Note_Item.product_code, Credit_Note_Item.department_id,

    Credit_Note_Item.description,

    Credit_Note_Item.narrative, Customer.customer_num,

    Credit_Note.credit_note_num,

    Credit_Note_Item.value_commission_hc,

    Credit_Note_Item.value_credit_note_hc,

    Credit_Note_Item.value_discount_hc,

    Credit_Note_Item.value_credit_note_vat_hc,

    Credit_Note_Item.nominal_account_number,

    Security_User_Name.User_Id,

    Transaction_Header.note,

    Transaction_Header.note2,

    Transaction_Header.note3,

    Reason_Type.description,

    Credit_Note.value_credit_note_vat_hc,

    vTMG_Sales_Person.sales_person,

    Business_Subset.description,

    vFP_Customer_Business_Types.Business_Types

    FROM

    Customer Customer INNER JOIN

    Credit_Note Credit_Note ON Customer.customer_num=Credit_Note.customer_num INNER JOIN

    Security_User_Name Security_User_Name ON Customer.business_subset_id=Security_User_Name.connected_business_subset_id INNER JOIN

    vTMG_Sales_Person vTMG_Sales_Person ON Customer.customer_num=vTMG_Sales_Person.customer_num LEFT OUTER JOIN

    vFP_Customer_Business_Types vFP_Customer_Business_Types ON Customer.customer_id=vFP_Customer_Business_Types.Account_number INNER JOIN

    Credit_Note_Item Credit_Note_Item ON Credit_Note.credit_note_num=Credit_Note_Item.credit_note_num LEFT OUTER JOIN

    Transaction_Header Transaction_Header ON Credit_Note.credit_note_id=Transaction_Header.transaction_num INNER JOIN

    Reason_Type Reason_Type ON Credit_Note_Item.reason_code=Reason_Type.reason_code INNER JOIN

    Business_Subset Business_Subset ON Security_User_Name.connected_business_subset_id=Business_Subset.business_subset_id

    WHERE

    Credit_Note.tax_point_date>={ts '2009-01-01 00:00:00'} and

    Credit_Note.tax_point_date<{ts '2009-01-09 00:00:00'} and

    Security_User_Name.User_Id='Jason.Parker'

    ORDER BY

    Credit_Note.credit_note_num

    ALTER VIEW [dbo].[vFP_Customer_Business_Types]

    AS

    SELECT *

    FROM OPENQUERY(LINKEDFP,

    'SELECT

    Customer_Account.Account_number,

    Customer.URN_number,

    Business_Types

    FROM

    Customer INNER JOIN

    Customer_Account ON Customer.Customer_Account_ID = Customer_Account.Customer_Account_ID INNER JOIN

    vTMG_Customer_Business_Types ON Customer.Customer_ID = vTMG_Customer_Business_Types.Customer_ID')

    http://90.212.51.111 domain

  • Create a table on the server and use it on a single Query, that makes this faster(This is not the ideal way of doing it ;)) , this again depends on the table updates, and how accurate your data should be. 🙂

  • neil (1/19/2009)


    One of my clients is running a select statement as part of a report. The problem is that is takes 5 minutes to run despite only returning 500 rows.

    Investigating I found that the view (vFP_Customer_Business_Types) that forms one of the "tables" in the select is itself a select from a linked server, and this select has to pull all rows from three remote tables (100000 rows). The problem is that sometimes the select on the remote server is shown by SQL Profiler to have run 15 times (returning the same dataset), whereas it ought to run only once.

    Why is it running 15 times, and is there a way to force it to run only once?

    One workaround is to do a select from the remote server into a temp table, and join on that. But my client says his reporting software cannot do this and needs a single query.

    Thanks

    SELECT

    Customer.company_name, Credit_Note.tax_point_date,

    Credit_Note_Item.product_code, Credit_Note_Item.department_id,

    Credit_Note_Item.description,

    Credit_Note_Item.narrative, Customer.customer_num,

    Credit_Note.credit_note_num,

    Credit_Note_Item.value_commission_hc,

    Credit_Note_Item.value_credit_note_hc,

    Credit_Note_Item.value_discount_hc,

    Credit_Note_Item.value_credit_note_vat_hc,

    Credit_Note_Item.nominal_account_number,

    Security_User_Name.User_Id,

    Transaction_Header.note,

    Transaction_Header.note2,

    Transaction_Header.note3,

    Reason_Type.description,

    Credit_Note.value_credit_note_vat_hc,

    vTMG_Sales_Person.sales_person,

    Business_Subset.description,

    vFP_Customer_Business_Types.Business_Types

    FROM

    Customer Customer INNER JOIN

    Credit_Note Credit_Note ON Customer.customer_num=Credit_Note.customer_num INNER JOIN

    Security_User_Name Security_User_Name ON Customer.business_subset_id=Security_User_Name.connected_business_subset_id INNER JOIN

    vTMG_Sales_Person vTMG_Sales_Person ON Customer.customer_num=vTMG_Sales_Person.customer_num LEFT OUTER JOIN

    vFP_Customer_Business_Types vFP_Customer_Business_Types ON Customer.customer_id=vFP_Customer_Business_Types.Account_number INNER JOIN

    Credit_Note_Item Credit_Note_Item ON Credit_Note.credit_note_num=Credit_Note_Item.credit_note_num LEFT OUTER JOIN

    Transaction_Header Transaction_Header ON Credit_Note.credit_note_id=Transaction_Header.transaction_num INNER JOIN

    Reason_Type Reason_Type ON Credit_Note_Item.reason_code=Reason_Type.reason_code INNER JOIN

    Business_Subset Business_Subset ON Security_User_Name.connected_business_subset_id=Business_Subset.business_subset_id

    WHERE

    Credit_Note.tax_point_date>={ts '2009-01-01 00:00:00'} and

    Credit_Note.tax_point_date<{ts '2009-01-09 00:00:00'} and

    Security_User_Name.User_Id='Jason.Parker'

    ORDER BY

    Credit_Note.credit_note_num

    ALTER VIEW [dbo].[vFP_Customer_Business_Types]

    AS

    SELECT *

    FROM OPENQUERY(LINKEDFP,

    'SELECT

    Customer_Account.Account_number,

    Customer.URN_number,

    Business_Types

    FROM

    Customer INNER JOIN

    Customer_Account ON Customer.Customer_Account_ID = Customer_Account.Customer_Account_ID INNER JOIN

    vTMG_Customer_Business_Types ON Customer.Customer_ID = vTMG_Customer_Business_Types.Customer_ID')

    Don't you just love the way linked table can get you in troubles.

    Using linked tables will get you in troubles, sooner or later.

    Q: What kind of isolation level do you thing it is using to query the data at the linked server ?

    Also keep in mind linked server queries are IO constrained as well by their originating instance as well as their referencing instance, including all locking mechanisms, ...

    If it is another sqlserver instance it is referencing, did you declare the linked server collation compatible (if possible) ? It will generate overhead not doing so if you can.

    Did you try to re-style the view to using your linked server like:

    ALTER VIEW [dbo].[vFP_Customer_Business_Types]

    AS

    SELECT

    CA..Account_number,

    C.URN_number,

    B.Business_Types

    FROM

    LINKEDFP.thedb.theschema.Customer C

    INNER JOIN

    LINKEDFP.thedb.theschema.Customer_Account CA

    ON C.Customer_Account_ID = CA.Customer_Account_ID

    INNER JOIN

    LINKEDFP.thedb.theschema.vTMG_Customer_Business_Types B

    ON C.Customer_ID = B.Customer_ID

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I had this problem too some time ago. In my case the view was responsible because it seems like views that join linked tables are extremely inefficient. I suspect that the views cannot use indexes on the linked tables. I my case after I rewrote my queries that they did not use the views anymore, the performance was okay.

  • The problem is that there is no good way for a linked server to know how to optimize queries that refer to remote tables. So what always seems to happen is all of the remote data is pulled into the local server... regardless of where clauses or join statements, and then all of the join and filtering criteria happens on the local server. Stored procedures or user defined functions on the remote side don't suffer from this however, since they are compiled and executed on the actual remote server.

    So, knowing this, one way I've solved this in the past, is to create a stored procedure on the remote server that performs the remote portion of the query.

    Then, on the local side, create a stored procedure that your report uses that creates a temporary table or a table variable (depending on the performance) and dumps the data from the remote stored procedure into that table.

    Then the stored procedure would just join the temporary/table variable (which ever you chose) together and remote on it happily.

    If you setup a primary key on the table variable, or add an index to your temporary table (again whichever you choose) then that also can increase the performance further depending on the query you are doing.

    Something like so....

    remote side...

    create procedure stp_RemoteDataGatheringStoredProc

    as

    Select A.Field1, B.Field2, C.Field3, ....

    From dbo.Table1

    inner join ....

    ...

    go

    and on the local server side....

    create procedure stp_ReportingStoredProcedure

    as

    Declare @RemoteData (Field1 varchar(30), Field2 varchar(30),....)

    insert into @RemoteData

    Exec REMOTESERVER.RemoteDB.dbo.stp_RemoteDataGatheringStoredProc

    select *

    from LocalTable1 as A

    Inner Join LocalTable2 as B on (A.Field1 = B.Field1)

    Inner join @RemoteData as C on (B.Field2 = C.Field2)

    Go

    Hopefully this makes sense.

    Good Luck

  • Hi there,

    I work for the same company as the original poster (Neil) and wanted to make an addtional point on the problem we are having...

    The original query is ineffecient depending on the date period selected..i.e.

    If I see the issue if I change the start and end date to recent dates, this will run fine

    Credit_Note.tax_point_date>={ts '2009-01-01 00:00:00'} and

    Credit_Note.tax_point_date<{ts '2009-01-14 00:00:00'}

    But this takes minutes

    Credit_Note.tax_point_date>={ts '2009-01-10 00:00:00'} and

    Credit_Note.tax_point_date<{ts '2009-01-14 00:00:00'}

    It seems that SQL Server is handling the 2 queries differently, in the first query it retrieves the data from FP in one chunk, but in the second query it is running multiple queries against FP and returning the data in multiple chunks.

    This usually happens whereby the latest week range takes a long time but it a longer period is selected then it runs quicker!

    any ideaS?

    My DBA Ramblings - SQL Server | Oracle | MySQL | MongoDB | Access[/url]

  • Issues with linked servers can be:

    - locking at linked server side

    - network

    - how and where does sqlserver solve the query to be executed at the linked server (full at linked server side, or partially at the local server)

    - It may even pull over the whole table's worth of data to handle the filtering locally !

    - cotention of temptb (as well locally as at linked server side)

    - Physical IO at the linked server level

    - memory pressure at linked server side

    .....

    Are indexes and statistics updated recently at both servers ?

    Figure out how much data it needs to pull over to your server to give you the results you need.

    Figure out how it could solve it in the most optimal case.

    Then you may find out why it has this "unstable" behavior.

    Figure out what kind of objects it is accessing at linked server side (and how they are organized (index/statistics).

    So to solve this issue, you have to provide sqlserver as much info as you can !!

    At query level as wel as at linked server definition level !

    first: solve the issue(s) at the local server (imo manily definition of LS)

    then trace it at the linked server (LS).

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ok - ive tried smaller and longer date ranges and the smaller one definately chooses a more inefficient path (attached)

    Key lookups appearing on the smaller date range plan, as well as different seek and scan usage. My initial thoughts are that the optimiser is deciding the plan based on the date range and suggesting the the longer date range plan is not effecient for the smaller date range?

    I appreciate that remote lookups take time - however the real issue is that this report works well on longer date ranges but suffers on a smaller date range - IF the smaller data range is the current week.

    What do you think?

    Thanks

    My DBA Ramblings - SQL Server | Oracle | MySQL | MongoDB | Access[/url]

  • First notice:

    [Customer_Business_Subset] starts with a full table scan in stead of an index seek !

    ([CMS3000_Reports].[dbo].[Customer_Business_Subset].[PK_Customer_Business_Subset])

    152802 rows of 49bytes are expected

    It wants so resolve [date_deleted] is null

    It may be a statistics issue, run sp_updatestatitics if you can (even if autoupdate stats is on)

    Maybe an index on that column may help out (if it makes sence).

    The join with the table [credit_note] results into 990.000 rows (of 51 bytes) !

    (in stead of the 200.000 (of 47 Bytes) at that point in the shorter running query)

    So it start of very different.

    You'll have to figure out why.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply