[SERVER-21501] Restart of secondary results in additional NetworkInterfaceASIO logging Created: 17/Nov/15  Updated: 01/Dec/15  Resolved: 01/Dec/15

Status: Closed
Project: Core Server
Component/s: Logging, Networking, Replication
Affects Version/s: 3.2.0-rc3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: James Wahlin Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-21249 Replica set reports down member is he... Closed
Operating System: ALL
Steps To Reproduce:
  1. Start a 3 member replica set
  2. Shut down one of the secondaries
  3. Start the down secondary back up
  4. The secondary that was restarted will start printing NetworkInterfaceASIO connection messages.
Sprint: Repl C (11/20/15), Repl D (12/11/15)
Participants:

 Description   

When a 3 member replica set is started at log level 0, there are initial connection messages printed but then nothing regularly after that.

Restart one of the secondaries and you will see NetworkInterfaceASIO connection messages (such as the following) printed regularly. In my testing they are printed anywhere between 2 and 20 seconds apart. Not sure if this is just a logging issue or something deeper.

2015-11-17T10:56:01.980-0500 I ASIO     [NetworkInterfaceASIO] Successfully connected to James-MacBook-Pro.local:27017
2015-11-17T10:56:01.981-0500 I ASIO     [NetworkInterfaceASIO] Successfully connected to James-MacBook-Pro.local:27019
2015-11-17T10:56:03.986-0500 I ASIO     [NetworkInterfaceASIO] Successfully connected to James-MacBook-Pro.local:27017
2015-11-17T10:56:03.987-0500 I ASIO     [NetworkInterfaceASIO] Successfully connected to James-MacBook-Pro.local:27019
2015-11-17T10:56:05.994-0500 I ASIO     [NetworkInterfaceASIO] Successfully connected to James-MacBook-Pro.local:27017
2015-11-17T10:56:05.995-0500 I ASIO     [NetworkInterfaceASIO] Successfully connected to James-MacBook-Pro.local:27019
2015-11-17T10:56:14.012-0500 I ASIO     [NetworkInterfaceASIO] Successfully connected to James-MacBook-Pro.local:27017
2015-11-17T10:56:14.013-0500 I ASIO     [NetworkInterfaceASIO] Successfully connected to James-MacBook-Pro.local:27019
2015-11-17T10:56:18.019-0500 I ASIO     [NetworkInterfaceASIO] Successfully connected to James-MacBook-Pro.local:27017
2015-11-17T10:56:18.020-0500 I ASIO     [NetworkInterfaceASIO] Successfully connected to James-MacBook-Pro.local:27019



 Comments   
Comment by Matt Dannenberg [ 01/Dec/15 ]

The fix for SERVER- 21249 stopped the constant restarting of heartbeats when no sync source is present, which was the cause of this flood of log messages.

Comment by Matt Dannenberg [ 18/Nov/15 ]

It looks like heartbeats keep being cancelled:

2015-11-18T08:46:40.949-0500 D ASIO     [ReplicationExecutor] startCommand: RemoteCommand -- target:dannenstation:27019 db:admin expDate:2015-11-18T08:46:50.949-0500 cmd:{ replSetHeartbeat: "rs0", pv: 1, v: 1, from: "dannenstation:27018", fromId: 1, checkEmpty: false }
2015-11-18T08:46:40.949-0500 D ASIO     [NetworkInterfaceASIO-Replication-1] Starting asynchronous command on host dannenstation:27019
2015-11-18T08:46:40.949-0500 D ASIO     [ReplicationExecutor] startCommand: RemoteCommand -- target:dannenstation:27017 db:admin expDate:2015-11-18T08:46:50.949-0500 cmd:{ replSetHeartbeat: "rs0", pv: 1, v: 1, from: "dannenstation:27018", fromId: 1, checkEmpty: false }
2015-11-18T08:46:40.949-0500 D ASIO     [ReplicationExecutor] Canceling operation; original request was: RemoteCommand -- target:dannenstation:27019 db:admin expDate:2015-11-18T08:46:50.949-0500 cmd:{ replSetHeartbeat: "rs0", pv: 1, v: 1, from: "dannenstation:27018", fromId: 1, checkEmpty: false }
2015-11-18T08:46:40.949-0500 D ASIO     [ReplicationExecutor] Canceling operation; original request was: RemoteCommand -- target:dannenstation:27017 db:admin expDate:2015-11-18T08:46:50.949-0500 cmd:{ replSetHeartbeat: "rs0", pv: 1, v: 1, from: "dannenstation:27018", fromId: 1, checkEmpty: false }
2015-11-18T08:46:40.949-0500 D ASIO     [NetworkInterfaceASIO-Replication-1] Starting asynchronous command on host dannenstation:27017
2015-11-18T08:46:40.949-0500 D REPL     [ReplicationExecutor] Scheduling heartbeat to dannenstation:27017 at 2015-11-18T13:46:40.949Z
2015-11-18T08:46:40.949-0500 D ASIO     [NetworkInterfaceASIO-Replication-1] Failed to execute command: RemoteCommand -- target:dannenstation:27017 db:admin expDate:2015-11-18T08:46:50.949-0500 cmd:{ replSetHeartbeat: "rs0", pv: 1, v: 1, from: "dannenstation:27018", fromId: 1, checkEmpty: false } reason: CallbackCanceled Callback canceled
2015-11-18T08:46:40.949-0500 D REPL     [ReplicationExecutor] Scheduling heartbeat to dannenstation:27019 at 2015-11-18T13:46:40.949Z
2015-11-18T08:46:40.949-0500 D ASIO     [NetworkInterfaceASIO-Replication-1] Failed to execute command: RemoteCommand -- target:dannenstation:27019 db:admin expDate:2015-11-18T08:46:50.949-0500 cmd:{ replSetHeartbeat: "rs0", pv: 1, v: 1, from: "dannenstation:27018", fromId: 1, checkEmpty: false } reason: CallbackCanceled Callback canceled
2015-11-18T08:46:40.949-0500 D ASIO     [ReplicationExecutor] startCommand: RemoteCommand -- target:dannenstation:27017 db:admin expDate:2015-11-18T08:46:50.949-0500 cmd:{ replSetHeartbeat: "rs0", pv: 1, v: 1, from: "dannenstation:27018", fromId: 1, checkEmpty: false }
2015-11-18T08:46:40.950-0500 D ASIO     [NetworkInterfaceASIO-Replication-1] Connecting to dannenstation:27017
2015-11-18T08:46:40.950-0500 D ASIO     [ReplicationExecutor] startCommand: RemoteCommand -- target:dannenstation:27019 db:admin expDate:2015-11-18T08:46:50.949-0500 cmd:{ replSetHeartbeat: "rs0", pv: 1, v: 1, from: "dannenstation:27018", fromId: 1, checkEmpty: false }
2015-11-18T08:46:40.950-0500 D ASIO     [NetworkInterfaceASIO-Replication-1] Connecting to dannenstation:27019
2015-11-18T08:46:40.952-0500 D ASIO     [NetworkInterfaceASIO-Replication-1] Starting asynchronous command on host dannenstation:27017
2015-11-18T08:46:40.953-0500 I ASIO     [NetworkInterfaceASIO-Replication-1] Successfully connected to dannenstation:27017
2015-11-18T08:46:40.953-0500 D ASIO     [NetworkInterfaceASIO-Replication-1] Starting asynchronous command on host dannenstation:27017
2015-11-18T08:46:40.953-0500 D REPL     [ReplicationExecutor] Scheduling heartbeat to dannenstation:27017 at 2015-11-18T13:46:42.953Z
2015-11-18T08:46:40.954-0500 D ASIO     [NetworkInterfaceASIO-Replication-1] Starting asynchronous command on host dannenstation:27019
2015-11-18T08:46:40.954-0500 I ASIO     [NetworkInterfaceASIO-Replication-1] Successfully connected to dannenstation:27019
2015-11-18T08:46:40.954-0500 D ASIO     [NetworkInterfaceASIO-Replication-1] Starting asynchronous command on host dannenstation:27019

Comment by Adam Midvidy [ 18/Nov/15 ]

I have an idea for a small logging change that will make the cause of these reconnections clearer. kaloian.manassiev discovered that we may not be logging failed commands in the event that they were canceled.

Comment by James Wahlin [ 18/Nov/15 ]

I am running this again and the additional logging has been constant now for 30 minutes. I think it is safe to assume it will continue indefinitely barring any events that interrupt.

Comment by James Wahlin [ 17/Nov/15 ]

It appears to be indefinite (though my test only went to ~ 5 minutes)

Comment by Adam Midvidy [ 17/Nov/15 ]

Does this logging continue indefinitely or does it stop after some amount of time?

Comment by Eric Milkie [ 17/Nov/15 ]

I wonder if this is related to the things that EricD was seeing on MMS.

Generated at Thu Feb 08 03:57:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.