Remote query gets killed after ten minutes

  • I looked at that link when the issue started and after lots of testing we haven't found anything that leads us to believe this is being caused by any of that, the logs basically shows nothing as well... 
    at this point I am almost sure that it is a network setting that we haven't seen so far

  • Can you replicate the problem on another computer?
    If you can replicate the problem, it makes me think the problem is not related to the end users computer.
    If you cannot replicate it, it is likely a setting on the end users machine.

    And when you say the logs shows nothing, the log doesn't give any indication that it is disconnecting the end user?  If it does show the end user being disconnected, could you copy-paste that into this thread?  the exact error numbers in the log may be helpful.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • that is only happening at the computers from the users inside certain network (outside mine)

  • To me, it sound then like audrey.abbey is likely correct with it being some weird network related issue on that particular network.

    You could probably run an extended events session or profiler or a trace to figure out what is going on from the database side of things, but if it is a configuration thing on that specific network, you may not see anything strange in the trace/EE session.

    Did you see this post:
    https://blogs.msdn.microsoft.com/sql_protocols/2008/04/08/understanding-connection-forcibly-closed-by-remote-host-errors-caused-by-toechimney/

    That may fix the issue.  Essentially, if you see the error you are indicating, there are no corresponding network-related error messages in the SQL Server instance’s ERRORLOGs, and nobody is running "KILL" on the users SPID, you could try turning off the TOE/Chimney by issuing the following command from an elevated command prompt (no reboot required): netsh int ip set chimney DISABLED

    Failing that, you may need to run some network diagnostic tools on the end users machine.  

    out of curiosity, if the end users run "ping -n 100 <IP of SQL instance>", is it a 100% success?  doing 100 pings will take a few minutes to complete but if it is more than 0% loss, you may have network hardware issues (or possibly software if you use some VPN software to connect the 2 networks).

    It is strange that it is so easy to time it though since it is exactly 10 minutes.  Might not hurt to run a trace/EE session to see if you can catch something interesting.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thank you bmg002, we will give it a try to that and post back the resutls

  • bmg002 - Wednesday, September 6, 2017 8:18 AM

    To me, it sound then like audrey.abbey is likely correct with it being some weird network related issue on that particular network.

    You could probably run an extended events session or profiler or a trace to figure out what is going on from the database side of things, but if it is a configuration thing on that specific network, you may not see anything strange in the trace/EE session.

    Did you see this post:
    https://blogs.msdn.microsoft.com/sql_protocols/2008/04/08/understanding-connection-forcibly-closed-by-remote-host-errors-caused-by-toechimney/

    That may fix the issue.  Essentially, if you see the error you are indicating, there are no corresponding network-related error messages in the SQL Server instance’s ERRORLOGs, and nobody is running "KILL" on the users SPID, you could try turning off the TOE/Chimney by issuing the following command from an elevated command prompt (no reboot required): netsh int ip set chimney DISABLED

    Failing that, you may need to run some network diagnostic tools on the end users machine.  

    out of curiosity, if the end users run "ping -n 100 <IP of SQL instance>", is it a 100% success?  doing 100 pings will take a few minutes to complete but if it is more than 0% loss, you may have network hardware issues (or possibly software if you use some VPN software to connect the 2 networks).

    It is strange that it is so easy to time it though since it is exactly 10 minutes.  Might not hurt to run a trace/EE session to see if you can catch something interesting.

    Heh... easy fix, right?  Just reboot the box every 9 minutes and write faster code. 😀😛:hehe::Whistling:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, September 6, 2017 1:07 PM

    bmg002 - Wednesday, September 6, 2017 8:18 AM

    To me, it sound then like audrey.abbey is likely correct with it being some weird network related issue on that particular network.

    You could probably run an extended events session or profiler or a trace to figure out what is going on from the database side of things, but if it is a configuration thing on that specific network, you may not see anything strange in the trace/EE session.

    Did you see this post:
    https://blogs.msdn.microsoft.com/sql_protocols/2008/04/08/understanding-connection-forcibly-closed-by-remote-host-errors-caused-by-toechimney/

    That may fix the issue.  Essentially, if you see the error you are indicating, there are no corresponding network-related error messages in the SQL Server instance’s ERRORLOGs, and nobody is running "KILL" on the users SPID, you could try turning off the TOE/Chimney by issuing the following command from an elevated command prompt (no reboot required): netsh int ip set chimney DISABLED

    Failing that, you may need to run some network diagnostic tools on the end users machine.  

    out of curiosity, if the end users run "ping -n 100 <IP of SQL instance>", is it a 100% success?  doing 100 pings will take a few minutes to complete but if it is more than 0% loss, you may have network hardware issues (or possibly software if you use some VPN software to connect the 2 networks).

    It is strange that it is so easy to time it though since it is exactly 10 minutes.  Might not hurt to run a trace/EE session to see if you can catch something interesting.

    Heh... easy fix, right?  Just reboot the box every 9 minutes and write faster code. 😀😛:hehe::Whistling:

    I didn't suggest rebooting the machine or writing faster code.  I was suggesting to disable the TOE/Chimney on the TCP/IP stack and made a comment that it doesn't require a reboot when you make this change.
    I also was asking about the output of a larger than default number of pings to see if there is some network connection between the 2 machines breaking.
    And I suggested running a trace or an extended events session against the instance to see if SQL is force closing the connection or if it is something else.  Since nothing interesting is in the log, it makes me think it is likely something with the connection between the remote machine(s) and the SQL instance.
    And while I agree writing faster code is a good idea and that it applies to pretty much any problem (SQL or otherwise), I don't think that will correct this problem.  Plus, who would recommend writing slower code? 😛

    I am leaning towards this being a network problem because the SQL instance doesn't have auto-close on that database and that the timeouts are set to 0 on both the client (SSMS) and the server (SQL Instance).  But I've been wrong on things before.  Could be someone was feeding the gremlins after midnight...

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • bmg002 - Wednesday, September 6, 2017 1:34 PM


    I am leaning towards this being a network problem because the SQL instance doesn't have auto-close on that database and that the timeouts are set to 0 on both the client (SSMS) and the server (SQL Instance).  But I've been wrong on things before.  Could be someone was feeding the gremlins after midnight...

    I was chatting with my husband about this last night. 
    He is a network security engineer. 

    He had a similar issue with an Oracle query recently. 
    Turns out, there was a time out setting on a firewall managed by a third party that connected the two networks!

    He also said that there is another sneaky thing that can happen. I don't know the jargon, so apologies if I get anything wrong. 
    There is a "Keep Alive" signal that network devices use to know whether to keep a connection open. 
    Sometimes, this keep alive signal uses a different network protocol than the main connection. 
    So the connection for the query itself may be fine...but if the Keep Alive signal is getting blocked, the remote network will kill it, even though the query is actively running and obviously "Alive". 

    Hope that helps.

  • audrey.abbey - Wednesday, September 6, 2017 2:29 PM

    bmg002 - Wednesday, September 6, 2017 1:34 PM


    I am leaning towards this being a network problem because the SQL instance doesn't have auto-close on that database and that the timeouts are set to 0 on both the client (SSMS) and the server (SQL Instance).  But I've been wrong on things before.  Could be someone was feeding the gremlins after midnight...

    I was chatting with my husband about this last night. 
    He is a network security engineer. 

    He had a similar issue with an Oracle query recently. 
    Turns out, there was a time out setting on a firewall managed by a third party that connected the two networks!

    He also said that there is another sneaky thing that can happen. I don't know the jargon, so apologies if I get anything wrong. 
    There is a "Keep Alive" signal that network devices use to know whether to keep a connection open. 
    Sometimes, this keep alive signal uses a different network protocol than the main connection. 
    So the connection for the query itself may be fine...but if the Keep Alive signal is getting blocked, the remote network will kill it, even though the query is actively running and obviously "Alive". 

    Hope that helps.

    You used the right wording! I've been following and was wondering if the Keep Alive setting is a part of this as well.
    I'd do a network trace as those would show up. Either way, I agree with both of you and think it's something network related.

    Sue

  • we will be able to test with the people at the other network until tonight, so I will post our findings tomorrow

  • bmg002 - Wednesday, September 6, 2017 1:34 PM

    Jeff Moden - Wednesday, September 6, 2017 1:07 PM

    bmg002 - Wednesday, September 6, 2017 8:18 AM

    To me, it sound then like audrey.abbey is likely correct with it being some weird network related issue on that particular network.

    You could probably run an extended events session or profiler or a trace to figure out what is going on from the database side of things, but if it is a configuration thing on that specific network, you may not see anything strange in the trace/EE session.

    Did you see this post:
    https://blogs.msdn.microsoft.com/sql_protocols/2008/04/08/understanding-connection-forcibly-closed-by-remote-host-errors-caused-by-toechimney/

    That may fix the issue.  Essentially, if you see the error you are indicating, there are no corresponding network-related error messages in the SQL Server instance’s ERRORLOGs, and nobody is running "KILL" on the users SPID, you could try turning off the TOE/Chimney by issuing the following command from an elevated command prompt (no reboot required): netsh int ip set chimney DISABLED

    Failing that, you may need to run some network diagnostic tools on the end users machine.  

    out of curiosity, if the end users run "ping -n 100 <IP of SQL instance>", is it a 100% success?  doing 100 pings will take a few minutes to complete but if it is more than 0% loss, you may have network hardware issues (or possibly software if you use some VPN software to connect the 2 networks).

    It is strange that it is so easy to time it though since it is exactly 10 minutes.  Might not hurt to run a trace/EE session to see if you can catch something interesting.

    Heh... easy fix, right?  Just reboot the box every 9 minutes and write faster code. 😀😛:hehe::Whistling:

    I didn't suggest rebooting the machine or writing faster code.  I was suggesting to disable the TOE/Chimney on the TCP/IP stack and made a comment that it doesn't require a reboot when you make this change.
    I also was asking about the output of a larger than default number of pings to see if there is some network connection between the 2 machines breaking.
    And I suggested running a trace or an extended events session against the instance to see if SQL is force closing the connection or if it is something else.  Since nothing interesting is in the log, it makes me think it is likely something with the connection between the remote machine(s) and the SQL instance.
    And while I agree writing faster code is a good idea and that it applies to pretty much any problem (SQL or otherwise), I don't think that will correct this problem.  Plus, who would recommend writing slower code? 😛

    I am leaning towards this being a network problem because the SQL instance doesn't have auto-close on that database and that the timeouts are set to 0 on both the client (SSMS) and the server (SQL Instance).  But I've been wrong on things before.  Could be someone was feeding the gremlins after midnight...

    Sorry... it didn't come across right.  I was absolutely and completely tongue-in-cheek joking (and had 4 smillies trying to indicate that).  I was laughing out loud as I typed it because I do know some people that would have actually made such a crazy suggestion as my joke. There's no doubt in my mind that your not one of those people.

    My apologies for the confusion.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hum.

    Have you looked at the query timeout property of the linked server itself? if this one is set it will be used instead of the global setting.
    select *
    from sys.servers

  • ricardo_chicas - Wednesday, September 6, 2017 3:00 PM

    we will be able to test with the people at the other network until tonight, so I will post our findings tomorrow

    I'm still curious about this.

    Did anything happen with this yet? Did you get a chance to test things or find anything with the network?

    Sue

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply