After a server move to the failover Data Centre – a file copy between the two sites slowed down by 6x. I ran the copy a few times to confirm the slow down.
The file is part of a SQL Server ETL process. The ETL process transforms the data and updates the Datawarehouse,commits sime SQL Server statistics and Datawarehouse and prepares for a daily reconciliation.
As a DBA it can be difficult to prove a process is slow due to network issues- particularly if the issue is not experienced by everyone. The Network team can be intransigent . The Network is critical to overall systems performance and application owners must have confidence.
My approach in working with Networks is to supply as much evidence as possible, and then they can investigate issues.
First step was to present some evidence to the Network Team . The traceroute one way was:
Tracing route to server1.my.domain.net [10.140.57.27] over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms 10.140.144.254
2 <1 ms <1 ms <1 ms 10.140.34.21
3 <1 ms <1 ms <1 ms 10.140.34.9
4 <1 ms <1 ms <1 ms 10.140.57.253
5 <1 ms 1 ms 1 ms 10.130.85.252
6 2 ms 1 ms 2 ms server1.my.domain.net [10.140.57.27]
The traceroute the other way was:
Tracing route to server2.my.domain.net [10.140.150.36] over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms 10.140.27.254
2 <1 ms <1 ms <1 ms 10.122.34.30
3 2 ms 2 ms 2 ms 10.140.34.65
4 3 ms 1 ms 1 ms 10.140.34.69
5 1 ms <1 ms <1 ms 10.140.34.22
6 1 ms <1 ms <1 ms server2.my.domain.net [10.140.150.36]
The problem is one way the traffic was routed in an efficient manner. The other way the traffic was routed inefficiently.
Working with the Network team , we noticed the link-state database on physically adjacent routers had not been updated . We issued the show ip ospf neighbour command on the Cisco routers. The command revealed nothing.
After some investigation the Network team discovered the ospf on the neighbouring router had not been enabled. Using show ip ospf interface verified that ospf was not enabled.
Once the ospf was enabled on the neighbouring router , the link updates finished and the file copy returned to the usual faster duration
A good find – based on some evidence and teamwork!