-
Type:
Bug
-
Resolution: Incomplete
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
ALL
-
Hide
It begins:
2019-08-31T00:24:47.927-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:47.929-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:47.984-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:47.984-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:47.985-0500 I REPL [ReplicationExecutor] Member mongo-machine01b.internal-domain.xyz:27017 is now in state PRIMARY 2019-08-31T00:24:48.600-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:48.601-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:48.601-0500 I REPL [ReplicationExecutor] Member ev-mongo01a.internal-domain.xyz:27017 is now in state SECONDARY 2019-08-31T00:24:48.930-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:48.930-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:48.930-0500 I REPL [ReplicationExecutor] Member mongo-machineB01.internal-domain.xyz:27017 is now in state SECONDARY 2019-08-31T00:24:48.985-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:48.985-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:48.986-0500 I REPL [ReplicationExecutor] Member mongo-machine01b.internal-domain.xyz:27017 is now in state PRIMARY 2019-08-31T00:24:49.603-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:49.603-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:49.931-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:49.931-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:49.986-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:49.986-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246
.... a bunch of clients reconnecting and queries in the queue executing
2019-08-31T00:24:50.636-0500 I REPL [replExecDBWorker-2] transition to SECONDARY 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] legacy transport layer closing all connections 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576628 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576627 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576624 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576623 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576622 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576621 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576620 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576619 2019-08-31T00:24:50.639-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576400 2019-08-31T00:24:50.639-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576399 2019-08-31T00:24:50.641-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576062 2019-08-31T00:24:50.642-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 575947 2019-08-31T00:24:50.643-0500 I - [conn576612] AssertionException handling request, closing client connection: 172 Operation attempted on a closed transport Session. 2019-08-31T00:24:50.681-0500 I - [conn535386] AssertionException handling request, closing client connection: 172 Operation attempted on a closed transport Session. 2019-08-31T00:24:50.681-0500 I - [conn535392] AssertionException handling request, closing client connection: 172 Operation attempted on a closed transport Session. (dozens of the above line repeat)
.... a bunch of clients reconnecting
2019-08-31T00:24:51.680-0500 I REPL [rsBackgroundSync] chaining not allowed, choosing primary as sync source candidate: mongo-machine01b.internal-domain.xyz:27017 2019-08-31T00:24:51.683-0500 I ASIO [NetworkInterfaceASIO-RS-0] Connecting to mongo-machine01b.internal-domain.xyz:27017 2019-08-31T00:24:51.693-0500 I ASIO [NetworkInterfaceASIO-RS-0] Successfully connected to mongo-machine01b.internal-domain.xyz:27017, took 11ms (1 connections now open to mongo-machine01b.internal-domain.xyz:27017) 2019-08-31T00:24:51.771-0500 I ASIO [NetworkInterfaceASIO-RS-0] Connecting to mongo-machine01b.internal-domain.xyz:27017 2019-08-31T00:24:51.775-0500 I ASIO [NetworkInterfaceASIO-RS-0] Successfully connected to mongo-machine01b.internal-domain.xyz:27017, took 5ms (2 connections now open to mongo-machine01b.internal-domain.xyz:27017)
.... a bunch of clients reconnecting. And then it decides it wants to be PRIMARY again:
2019-08-31T00:24:57.648-0500 I REPL [ReplicationExecutor] Canceling priority takeover callback 2019-08-31T00:24:57.648-0500 I REPL [ReplicationExecutor] Starting an election for a priority takeover 2019-08-31T00:24:57.648-0500 I REPL [ReplicationExecutor] conducting a dry run election to see if we could be elected. current term: 246 2019-08-31T00:24:57.649-0500 I REPL [ReplicationExecutor] VoteRequester(term 246 dry run) received a yes vote from mongo-machineB01.internal-domain.xyz:27017; response message: { term: 246, voteGranted: true, reason: "", ok: 1.0 } 2019-08-31T00:24:57.649-0500 I REPL [ReplicationExecutor] dry election run succeeded, running for election in term 247 2019-08-31T00:24:57.649-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Ending connection to host mongo-machine01b.internal-domain.xyz:27017 due to bad connection status; 0 connections to that host remain open 2019-08-31T00:24:57.653-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to mongo-machine01b.internal-domain.xyz:27017 2019-08-31T00:24:57.661-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to mongo-machine01b.internal-domain.xyz:27017, took 8ms (1 connections now open to mongo-machine01b.internal-domain.xyz:27017) 2019-08-31T00:24:57.666-0500 I REPL [ReplicationExecutor] VoteRequester(term 247) received a yes vote from mongo-machineB01.internal-domain.xyz:27017; response message: { term: 247, voteGranted: true, reason: "", ok: 1.0 } 2019-08-31T00:24:57.666-0500 I REPL [ReplicationExecutor] election succeeded, assuming primary role in term 247 2019-08-31T00:24:57.666-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Ending connection to host mongo-machine01b.internal-domain.xyz:27017 due to bad connection status; 0 connections to that host remain open 2019-08-31T00:24:57.666-0500 I REPL [ReplicationExecutor] transition to PRIMARY 2019-08-31T00:24:57.666-0500 I REPL [ReplicationExecutor] Entering primary catch-up mode. 2019-08-31T00:24:57.666-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to mongo-machine01b.internal-domain.xyz:27017 2019-08-31T00:24:57.697-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to mongo-machine01b.internal-domain.xyz:27017, took 31ms (1 connections now open to mongo-machine01b.internal-domain.xyz:27017) 2019-08-31T00:24:57.697-0500 I REPL [ReplicationExecutor] Member mongo-machine01b.internal-domain.xyz:27017 is now in state SECONDARY 2019-08-31T00:24:57.697-0500 I REPL [ReplicationExecutor] Caught up to the latest optime known via heartbeats after becoming primary. 2019-08-31T00:24:57.698-0500 I REPL [ReplicationExecutor] Exited primary catch-up mode. 2019-08-31T00:24:57.698-0500 I ASIO [NetworkInterfaceASIO-RS-0] Ending connection to host mongo-machine01b.internal-domain.xyz:27017 due to bad connection status; 1 connections to that host remain open 2019-08-31T00:24:57.698-0500 I REPL [rsBackgroundSync] Replication producer stopped after oplog fetcher finished returning a batch from our sync source. Abandoning this batch of oplog entries and re-evaluating our sync source. 2019-08-31T00:24:58.216-0500 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update to mongo-machine01b.internal-domain.xyz:27017: InvalidSyncSource: Sync source was cleared. Was mongo-machine01b.internal-domain.xyz:27017 2019-08-31T00:24:59.591-0500 I REPL [rsSync] transition to primary complete; database writes are now permitted
ShowIt begins: 2019-08-31T00:24:47.927-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:47.929-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:47.984-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:47.984-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:47.985-0500 I REPL [ReplicationExecutor] Member mongo-machine01b.internal-domain.xyz:27017 is now in state PRIMARY 2019-08-31T00:24:48.600-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:48.601-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:48.601-0500 I REPL [ReplicationExecutor] Member ev-mongo01a.internal-domain.xyz:27017 is now in state SECONDARY 2019-08-31T00:24:48.930-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:48.930-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:48.930-0500 I REPL [ReplicationExecutor] Member mongo-machineB01.internal-domain.xyz:27017 is now in state SECONDARY 2019-08-31T00:24:48.985-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:48.985-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:48.986-0500 I REPL [ReplicationExecutor] Member mongo-machine01b.internal-domain.xyz:27017 is now in state PRIMARY 2019-08-31T00:24:49.603-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:49.603-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:49.931-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:49.931-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:49.986-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 2019-08-31T00:24:49.986-0500 I REPL [ReplicationExecutor] stepping down from primary, because a new term has begun: 246 .... a bunch of clients reconnecting and queries in the queue executing 2019-08-31T00:24:50.636-0500 I REPL [replExecDBWorker-2] transition to SECONDARY 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] legacy transport layer closing all connections 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576628 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576627 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576624 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576623 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576622 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576621 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576620 2019-08-31T00:24:50.636-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576619 2019-08-31T00:24:50.639-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576400 2019-08-31T00:24:50.639-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576399 2019-08-31T00:24:50.641-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 576062 2019-08-31T00:24:50.642-0500 I NETWORK [replExecDBWorker-2] Skip closing connection for connection # 575947 2019-08-31T00:24:50.643-0500 I - [conn576612] AssertionException handling request, closing client connection: 172 Operation attempted on a closed transport Session. 2019-08-31T00:24:50.681-0500 I - [conn535386] AssertionException handling request, closing client connection: 172 Operation attempted on a closed transport Session. 2019-08-31T00:24:50.681-0500 I - [conn535392] AssertionException handling request, closing client connection: 172 Operation attempted on a closed transport Session. (dozens of the above line repeat) .... a bunch of clients reconnecting 2019-08-31T00:24:51.680-0500 I REPL [rsBackgroundSync] chaining not allowed, choosing primary as sync source candidate: mongo-machine01b.internal-domain.xyz:27017 2019-08-31T00:24:51.683-0500 I ASIO [NetworkInterfaceASIO-RS-0] Connecting to mongo-machine01b.internal-domain.xyz:27017 2019-08-31T00:24:51.693-0500 I ASIO [NetworkInterfaceASIO-RS-0] Successfully connected to mongo-machine01b.internal-domain.xyz:27017, took 11ms (1 connections now open to mongo-machine01b.internal-domain.xyz:27017) 2019-08-31T00:24:51.771-0500 I ASIO [NetworkInterfaceASIO-RS-0] Connecting to mongo-machine01b.internal-domain.xyz:27017 2019-08-31T00:24:51.775-0500 I ASIO [NetworkInterfaceASIO-RS-0] Successfully connected to mongo-machine01b.internal-domain.xyz:27017, took 5ms (2 connections now open to mongo-machine01b.internal-domain.xyz:27017) .... a bunch of clients reconnecting. And then it decides it wants to be PRIMARY again: 2019-08-31T00:24:57.648-0500 I REPL [ReplicationExecutor] Canceling priority takeover callback 2019-08-31T00:24:57.648-0500 I REPL [ReplicationExecutor] Starting an election for a priority takeover 2019-08-31T00:24:57.648-0500 I REPL [ReplicationExecutor] conducting a dry run election to see if we could be elected. current term: 246 2019-08-31T00:24:57.649-0500 I REPL [ReplicationExecutor] VoteRequester(term 246 dry run) received a yes vote from mongo-machineB01.internal-domain.xyz:27017; response message: { term: 246, voteGranted: true, reason: "", ok: 1.0 } 2019-08-31T00:24:57.649-0500 I REPL [ReplicationExecutor] dry election run succeeded, running for election in term 247 2019-08-31T00:24:57.649-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Ending connection to host mongo-machine01b.internal-domain.xyz:27017 due to bad connection status; 0 connections to that host remain open 2019-08-31T00:24:57.653-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to mongo-machine01b.internal-domain.xyz:27017 2019-08-31T00:24:57.661-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to mongo-machine01b.internal-domain.xyz:27017, took 8ms (1 connections now open to mongo-machine01b.internal-domain.xyz:27017) 2019-08-31T00:24:57.666-0500 I REPL [ReplicationExecutor] VoteRequester(term 247) received a yes vote from mongo-machineB01.internal-domain.xyz:27017; response message: { term: 247, voteGranted: true, reason: "", ok: 1.0 } 2019-08-31T00:24:57.666-0500 I REPL [ReplicationExecutor] election succeeded, assuming primary role in term 247 2019-08-31T00:24:57.666-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Ending connection to host mongo-machine01b.internal-domain.xyz:27017 due to bad connection status; 0 connections to that host remain open 2019-08-31T00:24:57.666-0500 I REPL [ReplicationExecutor] transition to PRIMARY 2019-08-31T00:24:57.666-0500 I REPL [ReplicationExecutor] Entering primary catch-up mode. 2019-08-31T00:24:57.666-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to mongo-machine01b.internal-domain.xyz:27017 2019-08-31T00:24:57.697-0500 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to mongo-machine01b.internal-domain.xyz:27017, took 31ms (1 connections now open to mongo-machine01b.internal-domain.xyz:27017) 2019-08-31T00:24:57.697-0500 I REPL [ReplicationExecutor] Member mongo-machine01b.internal-domain.xyz:27017 is now in state SECONDARY 2019-08-31T00:24:57.697-0500 I REPL [ReplicationExecutor] Caught up to the latest optime known via heartbeats after becoming primary. 2019-08-31T00:24:57.698-0500 I REPL [ReplicationExecutor] Exited primary catch-up mode. 2019-08-31T00:24:57.698-0500 I ASIO [NetworkInterfaceASIO-RS-0] Ending connection to host mongo-machine01b.internal-domain.xyz:27017 due to bad connection status; 1 connections to that host remain open 2019-08-31T00:24:57.698-0500 I REPL [rsBackgroundSync] Replication producer stopped after oplog fetcher finished returning a batch from our sync source. Abandoning this batch of oplog entries and re-evaluating our sync source. 2019-08-31T00:24:58.216-0500 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update to mongo-machine01b.internal-domain.xyz:27017: InvalidSyncSource: Sync source was cleared. Was mongo-machine01b.internal-domain.xyz:27017 2019-08-31T00:24:59.591-0500 I REPL [rsSync] transition to primary complete; database writes are now permitted
One of our clusters (out of dozens), will randomly tell itself to step down from being primary, without any commands being sent to the system to tell it.
We see it approximately 1-2 times per week, and always either in the valleys or peaks of our usage.