-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
RPL A (10/09/15)
ReplicationCoordinatorImpl should not be invoking ReplicationCoordinatorExternalState::startThreads() when the node is configured as an arbiter.
previous title:
rollback_*.js in replicasets_WT fail on Windows (rollback5.js, rollback_empty_ns.js, rollback_empty_o.js, rollback_fake_cmd.js, rollback_cmd_unrollbackable.js, rollback_empty_o2.js, rollback_collMod_fatal.js, rollback_different_h.js, rollback_dropdb.js)
A number of rollback_*.js tests in replicasets_WT are tripping an UnrecoverableRollbackError fassert on a couple of Windows variants (Windows Vista, Windows 2008R2). The introduction of this failure seems recent.
7 failures observed on 3be90504 alone:
- https://evergreen.mongodb.com/task/mongodb_mongo_master_windows_64_replicasets_WT_3be90504806da8f5d55d83e07f456f862fbc90e1_15_09_23_01_17_17 (4 failures)
- https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_windows_64_replicasets_WT_ese_3be90504806da8f5d55d83e07f456f862fbc90e1_15_09_23_01_17_17 (2 failures)
- https://evergreen.mongodb.com/task/mongodb_mongo_master_windows_64_2k8_ssl_asio_replicasets_WT_3be90504806da8f5d55d83e07f456f862fbc90e1_15_09_23_01_17_17 (1 failure)
Excerpt:
[js_test:rollback5] 2015-09-23T02:47:53.359+0000 d20012| 2015-09-23T02:47:53.360+0000 I REPL [ReplicationExecutor] Error in heartbeat request to WIN-MIAKGV0GBFF:20011; HostUnreachable No connection could be made because the target machine actively refused it. [js_test:rollback5] 2015-09-23T02:47:53.384+0000 d20012| 2015-09-23T02:47:53.385+0000 I REPL [ReplicationExecutor] Member WIN-MIAKGV0GBFF:20010 is now in state SECONDARY [js_test:rollback5] 2015-09-23T02:47:53.480+0000 2015-09-23T02:47:53.480+0000 W NETWORK [thread1] Failed to connect to 127.0.0.1:20011, reason: errno:10061 No connection could be made because the target machine actively refused it. [js_test:rollback5] 2015-09-23T02:47:53.480+0000 2015-09-23T02:47:53.480+0000 I NETWORK [thread1] reconnect 127.0.0.1:20011 (127.0.0.1) failed failed [js_test:rollback5] 2015-09-23T02:47:53.480+0000 ReplSetTest Could not call ismaster on node 1: Error: socket exception [CONNECT_ERROR] for couldn't connect to server 127.0.0.1:20011, connection attempt failed [js_test:rollback5] 2015-09-23T02:47:53.681+0000 2015-09-23T02:47:53.682+0000 I NETWORK [thread1] trying reconnect to 127.0.0.1:20011 (127.0.0.1) failed [js_test:rollback5] 2015-09-23T02:47:54.276+0000 d20010| 2015-09-23T02:47:54.276+0000 I REPL [ReplicationExecutor] Error in heartbeat request to WIN-MIAKGV0GBFF:20011; HostUnreachable No connection could be made because the target machine actively refused it. [js_test:rollback5] 2015-09-23T02:47:54.276+0000 d20010| 2015-09-23T02:47:54.276+0000 I REPL [ReplicationExecutor] Standing for election [js_test:rollback5] 2015-09-23T02:47:54.384+0000 d20012| 2015-09-23T02:47:54.385+0000 I REPL [ReplicationExecutor] syncing from: WIN-MIAKGV0GBFF:20010 [js_test:rollback5] 2015-09-23T02:47:54.384+0000 d20010| 2015-09-23T02:47:54.385+0000 I NETWORK [initandlisten] connection accepted from 10.187.48.125:64063 #4 (4 connections now open) [js_test:rollback5] 2015-09-23T02:47:54.385+0000 d20012| 2015-09-23T02:47:54.386+0000 I REPL [ReplicationExecutor] Error in heartbeat request to WIN-MIAKGV0GBFF:20011; HostUnreachable No connection could be made because the target machine actively refused it. [js_test:rollback5] 2015-09-23T02:47:54.385+0000 d20012| 2015-09-23T02:47:54.386+0000 I REPL [SyncSourceFeedback] setting syncSourceFeedback to WIN-MIAKGV0GBFF:20010 [js_test:rollback5] 2015-09-23T02:47:54.385+0000 d20010| 2015-09-23T02:47:54.386+0000 I NETWORK [conn4] end connection 10.187.48.125:64063 (3 connections now open) [js_test:rollback5] 2015-09-23T02:47:54.387+0000 d20010| 2015-09-23T02:47:54.386+0000 I NETWORK [initandlisten] connection accepted from 10.187.48.125:64065 #5 (4 connections now open) [js_test:rollback5] 2015-09-23T02:47:54.387+0000 d20010| 2015-09-23T02:47:54.387+0000 I NETWORK [initandlisten] connection accepted from 10.187.48.125:64066 #6 (5 connections now open) [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| 2015-09-23T02:47:54.387+0000 I REPL [rsBackgroundSync] starting rollback: OplogStartMissing our last op time fetched: (term: -1, timestamp: Sep 23 02:47:15:1). source's GTE: (term: -1, timestamp: Sep 23 02:47:15:1) [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| 2015-09-23T02:47:54.388+0000 F REPL [rsBackgroundSync] need to rollback, but in inconsistent state [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| 2015-09-23T02:47:54.388+0000 I - [rsBackgroundSync] Fatal assertion 28723 UnrecoverableRollbackError need to rollback, but in inconsistent state. minvalid: (term: -1, timestamp: Sep 23 02:47:18:1) our last optime: (term: -1, timestamp: Sep 23 02:47:15:1) @ 18750 [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| 2015-09-23T02:47:54.388+0000 I - [rsBackgroundSync] [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| ***aborting after fassert() failure [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012|
Assigning to benety.goh per matt.dannenberg's recommendation.
Benety: are you the appropriate assignee for this ticket? If so, please work on this today, or suggest someone for me to reassign to.
- is related to
-
SERVER-19956 arbiter should use commit level for its optime in elections
- Closed