Remote query gets killed after ten minutes

Question

Remote query gets killed after ten minutes

Viewing 13 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 1

I looked at that link when the issue started and after lots of testing we haven't found anything that leads us to believe this is being caused by any of that, the logs basically shows nothing as well...
at this point I am almost sure that it is a network setting that we haven't seen so far

Mr. Brian Gale SSC-Insane Points: 24989 More actions · Answer 2

Can you replicate the problem on another computer?
If you can replicate the problem, it makes me think the problem is not related to the end users computer.
If you cannot replicate it, it is likely a setting on the end users machine.

And when you say the logs shows nothing, the log doesn't give any indication that it is disconnecting the end user? If it does show the end user being disconnected, could you copy-paste that into this thread? the exact error numbers in the log may be helpful.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 3

that is only happening at the computers from the users inside certain network (outside mine)

Mr. Brian Gale SSC-Insane Points: 24989 More actions · Answer 4

To me, it sound then like audrey.abbey is likely correct with it being some weird network related issue on that particular network.

You could probably run an extended events session or profiler or a trace to figure out what is going on from the database side of things, but if it is a configuration thing on that specific network, you may not see anything strange in the trace/EE session.

Did you see this post:
https://blogs.msdn.microsoft.com/sql_protocols/2008/04/08/understanding-connection-forcibly-closed-by-remote-host-errors-caused-by-toechimney/

That may fix the issue. Essentially, if you see the error you are indicating, there are no corresponding network-related error messages in the SQL Server instance’s ERRORLOGs, and nobody is running "KILL" on the users SPID, you could try turning off the TOE/Chimney by issuing the following command from an elevated command prompt (no reboot required): netsh int ip set chimney DISABLED

Failing that, you may need to run some network diagnostic tools on the end users machine.

out of curiosity, if the end users run "ping -n 100 <IP of SQL instance>", is it a 100% success? doing 100 pings will take a few minutes to complete but if it is more than 0% loss, you may have network hardware issues (or possibly software if you use some VPN software to connect the 2 networks).

It is strange that it is so easy to time it though since it is exactly 10 minutes. Might not hurt to run a trace/EE session to see if you can catch something interesting.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 5

Thank you bmg002, we will give it a try to that and post back the resutls

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 6

bmg002 - Wednesday, September 6, 2017 8:18 AM
To me, it sound then like audrey.abbey is likely correct with it being some weird network related issue on that particular network.
You could probably run an extended events session or profiler or a trace to figure out what is going on from the database side of things, but if it is a configuration thing on that specific network, you may not see anything strange in the trace/EE session.
Did you see this post:
https://blogs.msdn.microsoft.com/sql_protocols/2008/04/08/understanding-connection-forcibly-closed-by-remote-host-errors-caused-by-toechimney/
That may fix the issue. Essentially, if you see the error you are indicating, there are no corresponding network-related error messages in the SQL Server instanceâ€™s ERRORLOGs, and nobody is running "KILL" on the users SPID, you could try turning off the TOE/Chimney by issuing the following command from an elevated command prompt (no reboot required): netsh int ip set chimney DISABLED
Failing that, you may need to run some network diagnostic tools on the end users machine.
out of curiosity, if the end users run "ping -n 100 <IP of SQL instance>", is it a 100% success? doing 100 pings will take a few minutes to complete but if it is more than 0% loss, you may have network hardware issues (or possibly software if you use some VPN software to connect the 2 networks).
It is strange that it is so easy to time it though since it is exactly 10 minutes. Might not hurt to run a trace/EE session to see if you can catch something interesting.

Heh... easy fix, right? Just reboot the box every 9 minutes and write faster code. 😀😛:hehe::Whistling:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Mr. Brian Gale SSC-Insane Points: 24989 More actions · Answer 7

Jeff Moden - Wednesday, September 6, 2017 1:07 PM
bmg002 - Wednesday, September 6, 2017 8:18 AM
To me, it sound then like audrey.abbey is likely correct with it being some weird network related issue on that particular network.
You could probably run an extended events session or profiler or a trace to figure out what is going on from the database side of things, but if it is a configuration thing on that specific network, you may not see anything strange in the trace/EE session.
Did you see this post:
https://blogs.msdn.microsoft.com/sql_protocols/2008/04/08/understanding-connection-forcibly-closed-by-remote-host-errors-caused-by-toechimney/
That may fix the issue. Essentially, if you see the error you are indicating, there are no corresponding network-related error messages in the SQL Server instanceâ€™s ERRORLOGs, and nobody is running "KILL" on the users SPID, you could try turning off the TOE/Chimney by issuing the following command from an elevated command prompt (no reboot required): netsh int ip set chimney DISABLED
Failing that, you may need to run some network diagnostic tools on the end users machine.
out of curiosity, if the end users run "ping -n 100 <IP of SQL instance>", is it a 100% success? doing 100 pings will take a few minutes to complete but if it is more than 0% loss, you may have network hardware issues (or possibly software if you use some VPN software to connect the 2 networks).
It is strange that it is so easy to time it though since it is exactly 10 minutes. Might not hurt to run a trace/EE session to see if you can catch something interesting.
Heh... easy fix, right? Just reboot the box every 9 minutes and write faster code. 😀😛:hehe::Whistling:

I didn't suggest rebooting the machine or writing faster code. I was suggesting to disable the TOE/Chimney on the TCP/IP stack and made a comment that it doesn't require a reboot when you make this change.
I also was asking about the output of a larger than default number of pings to see if there is some network connection between the 2 machines breaking.
And I suggested running a trace or an extended events session against the instance to see if SQL is force closing the connection or if it is something else. Since nothing interesting is in the log, it makes me think it is likely something with the connection between the remote machine(s) and the SQL instance.
And while I agree writing faster code is a good idea and that it applies to pretty much any problem (SQL or otherwise), I don't think that will correct this problem. Plus, who would recommend writing slower code? 😛

I am leaning towards this being a network problem because the SQL instance doesn't have auto-close on that database and that the timeouts are set to 0 on both the client (SSMS) and the server (SQL Instance). But I've been wrong on things before. Could be someone was feeding the gremlins after midnight...

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

audrey.abbey SSC Veteran Points: 211 More actions · Answer 8

bmg002 - Wednesday, September 6, 2017 1:34 PM

I am leaning towards this being a network problem because the SQL instance doesn't have auto-close on that database and that the timeouts are set to 0 on both the client (SSMS) and the server (SQL Instance). But I've been wrong on things before. Could be someone was feeding the gremlins after midnight...

I was chatting with my husband about this last night.
He is a network security engineer.

He had a similar issue with an Oracle query recently.
Turns out, there was a time out setting on a firewall managed by a third party that connected the two networks!

He also said that there is another sneaky thing that can happen. I don't know the jargon, so apologies if I get anything wrong.
There is a "Keep Alive" signal that network devices use to know whether to keep a connection open.
Sometimes, this keep alive signal uses a different network protocol than the main connection.
So the connection for the query itself may be fine...but if the Keep Alive signal is getting blocked, the remote network will kill it, even though the query is actively running and obviously "Alive".

Hope that helps.

Sue_H SSC Guru Points: 90860 More actions · Answer 9

audrey.abbey - Wednesday, September 6, 2017 2:29 PM
bmg002 - Wednesday, September 6, 2017 1:34 PM

I am leaning towards this being a network problem because the SQL instance doesn't have auto-close on that database and that the timeouts are set to 0 on both the client (SSMS) and the server (SQL Instance). But I've been wrong on things before. Could be someone was feeding the gremlins after midnight...
I was chatting with my husband about this last night.
He is a network security engineer.
He had a similar issue with an Oracle query recently.
Turns out, there was a time out setting on a firewall managed by a third party that connected the two networks!
He also said that there is another sneaky thing that can happen. I don't know the jargon, so apologies if I get anything wrong.
There is a "Keep Alive" signal that network devices use to know whether to keep a connection open.
Sometimes, this keep alive signal uses a different network protocol than the main connection.
So the connection for the query itself may be fine...but if the Keep Alive signal is getting blocked, the remote network will kill it, even though the query is actively running and obviously "Alive".
Hope that helps.

You used the right wording! I've been following and was wondering if the Keep Alive setting is a part of this as well.
I'd do a network trace as those would show up. Either way, I agree with both of you and think it's something network related.

Sue

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 10

we will be able to test with the people at the other network until tonight, so I will post our findings tomorrow

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 11

bmg002 - Wednesday, September 6, 2017 1:34 PM
Jeff Moden - Wednesday, September 6, 2017 1:07 PM
bmg002 - Wednesday, September 6, 2017 8:18 AM
To me, it sound then like audrey.abbey is likely correct with it being some weird network related issue on that particular network.
You could probably run an extended events session or profiler or a trace to figure out what is going on from the database side of things, but if it is a configuration thing on that specific network, you may not see anything strange in the trace/EE session.
Did you see this post:
https://blogs.msdn.microsoft.com/sql_protocols/2008/04/08/understanding-connection-forcibly-closed-by-remote-host-errors-caused-by-toechimney/
That may fix the issue. Essentially, if you see the error you are indicating, there are no corresponding network-related error messages in the SQL Server instanceâ€™s ERRORLOGs, and nobody is running "KILL" on the users SPID, you could try turning off the TOE/Chimney by issuing the following command from an elevated command prompt (no reboot required): netsh int ip set chimney DISABLED
Failing that, you may need to run some network diagnostic tools on the end users machine.
out of curiosity, if the end users run "ping -n 100 <IP of SQL instance>", is it a 100% success? doing 100 pings will take a few minutes to complete but if it is more than 0% loss, you may have network hardware issues (or possibly software if you use some VPN software to connect the 2 networks).
It is strange that it is so easy to time it though since it is exactly 10 minutes. Might not hurt to run a trace/EE session to see if you can catch something interesting.
Heh... easy fix, right? Just reboot the box every 9 minutes and write faster code. 😀😛:hehe::Whistling:
I didn't suggest rebooting the machine or writing faster code. I was suggesting to disable the TOE/Chimney on the TCP/IP stack and made a comment that it doesn't require a reboot when you make this change.
I also was asking about the output of a larger than default number of pings to see if there is some network connection between the 2 machines breaking.
And I suggested running a trace or an extended events session against the instance to see if SQL is force closing the connection or if it is something else. Since nothing interesting is in the log, it makes me think it is likely something with the connection between the remote machine(s) and the SQL instance.
And while I agree writing faster code is a good idea and that it applies to pretty much any problem (SQL or otherwise), I don't think that will correct this problem. Plus, who would recommend writing slower code? 😛
I am leaning towards this being a network problem because the SQL instance doesn't have auto-close on that database and that the timeouts are set to 0 on both the client (SSMS) and the server (SQL Instance). But I've been wrong on things before. Could be someone was feeding the gremlins after midnight...

Sorry... it didn't come across right. I was absolutely and completely tongue-in-cheek joking (and had 4 smillies trying to indicate that). I was laughing out loud as I typed it because I do know some people that would have actually made such a crazy suggestion as my joke. There's no doubt in my mind that your not one of those people.

My apologies for the confusion.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

frederico_fonseca SSCoach Points: 16294 More actions · Answer 12

Hum.

Have you looked at the query timeout property of the linked server itself? if this one is set it will be used instead of the global setting.
select *
from sys.servers

Sue_H SSC Guru Points: 90860 More actions · Answer 13

ricardo_chicas - Wednesday, September 6, 2017 3:00 PM
we will be able to test with the people at the other network until tonight, so I will post our findings tomorrow

I'm still curious about this.

Did anything happen with this yet? Did you get a chance to test things or find anything with the network?

Sue