ETL Tool job is slow

  • Hi Experts,

    I need some piece of advice on how to about this below scenario.

    In our environment we have ETL solution called BODS to load data into a datawarehouse.

    Application team says, one of the ETL job is running slow.

    Basically, it is not a SQL Agent Job. ETL tool has its own job creation and scheduling capabilities. When they run BODS ETL job, they connect various data sources, apply some data massaging logic using BODS inbuilt data flow transformation and finally it gets loaded into SQL Server tables. All the queries are generated from the GUI tool itself in the form of transformations.

    Now, the app team needs some help from Database admin team to fix long running ETL job.

    They say the job runs fine, sometimes long enough and sometimes it fails with below error.

    [LIBODBCHDB DLL][HDBODBC] Connection not open;-10807 Connection down: [89006] System call 'send' failed, rc=10054:An existing connection was forcibly closed by the remote host {10.xxx.xxx.x:portno} {ClientPort:51555}>..Error Context :Dataflow DF_Load_CUSTOMER_ACCOUNT_INFO_DUPLICATES

    In such scenarios how to go about troubleshooting such kind of SQL problems. They don't know what is exact SQL stmt which is causing the problem. In such situations, being a SQL DBA, what tuning efforts we can put in on such tricky sql tuning issues? what questions we might probably ask to drive such issues to a common solution.

    We suggested them to involve the networking team to see if there is any issues to force the sql connection to get killed.

    Looking for suggestions to better handle such tool generated issues? Do we need ask them to open a ticket with the respective ETL vendor to get some recommendations and what kind of help can a SQL DBA can provide value in these cases?

    Because we dont understand their tool and they dont understand what actually we are asking for ? (eg: exact sql query which is causing the issue).

    Best Regards,

    Bob

  • I would reach out to the vendor of the BODS tool myself.  I expect that the tool has a timeout (possibly the default) for keeping the connection open to SQL.  If it is using a hard-coded timeout (ie nothing you can configure), then they would need to update their tool to fix this.  But they will be able to provide the best direction for support.

    Since you have no ability to change how the tool works (ie change the SQL behind the tool), the best you can hope to do is to check what the tool is doing and make sure the queries are completing efficiently and if they are not, do what you can to improve that performance (archive data, indexes, etc).

    To check the TSQL being issued, I'd be looking at either extended events (the preferred method) or profiler (if it is an old, unsupported SQL instance).  This will allow you to see the exact query being issued.

     

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thank you for your guidance Brain.

  • bobrooney.81 wrote:

    Thank you for your guidance Brain.

    He's called Brian! And the advice he has provided is sound: if the ETL solution is provided by a third party and there are issues with it, they should be the ones providing support.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • BODS is SAP Business Object Data Services. HDBODBC is a SAP Hana based ODBC driver. If I understand correctly, BODS is writing data to a warehouse on SQL Server, and the connection is occasionally closed unexpectedly. We frequently move data between SAP Hana and SQL Server and I've never seen this issue. It looks like a potential network issue, or there's a timeout value set too low somewhere.

  • BODS is similar to SSIS. Initially you say the problem is slowness, but later it's connection failing. Are there 2 different problems ?

    Is there a problem with the source or target server's network connection ?

     

    If you go into BODS Administrator Console, maybe some of the "Error" or "Trace" options will give more information.

    • This reply was modified 3 years, 3 months ago by  homebrew01.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply