Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20579

arbiters should not start background sync and applier threads

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.1.9
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • Fully Compatible
    • ALL
    • RPL A (10/09/15)

      ReplicationCoordinatorImpl should not be invoking ReplicationCoordinatorExternalState::startThreads() when the node is configured as an arbiter.

      previous title:

      rollback_*.js in replicasets_WT fail on Windows (rollback5.js, rollback_empty_ns.js, rollback_empty_o.js, rollback_fake_cmd.js, rollback_cmd_unrollbackable.js, rollback_empty_o2.js, rollback_collMod_fatal.js, rollback_different_h.js, rollback_dropdb.js)

      A number of rollback_*.js tests in replicasets_WT are tripping an UnrecoverableRollbackError fassert on a couple of Windows variants (Windows Vista, Windows 2008R2). The introduction of this failure seems recent.

      7 failures observed on 3be90504 alone:

      Excerpt:

      [js_test:rollback5] 2015-09-23T02:47:53.359+0000 d20012| 2015-09-23T02:47:53.360+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to WIN-MIAKGV0GBFF:20011; HostUnreachable No connection could be made because the target machine actively refused it.
      [js_test:rollback5] 2015-09-23T02:47:53.384+0000 d20012| 2015-09-23T02:47:53.385+0000 I REPL     [ReplicationExecutor] Member WIN-MIAKGV0GBFF:20010 is now in state SECONDARY
      [js_test:rollback5] 2015-09-23T02:47:53.480+0000 2015-09-23T02:47:53.480+0000 W NETWORK  [thread1] Failed to connect to 127.0.0.1:20011, reason: errno:10061 No connection could be made because the target machine actively refused it.
      [js_test:rollback5] 2015-09-23T02:47:53.480+0000 2015-09-23T02:47:53.480+0000 I NETWORK  [thread1] reconnect 127.0.0.1:20011 (127.0.0.1) failed failed
      [js_test:rollback5] 2015-09-23T02:47:53.480+0000 ReplSetTest Could not call ismaster on node 1: Error: socket exception [CONNECT_ERROR] for couldn't connect to server 127.0.0.1:20011, connection attempt failed
      [js_test:rollback5] 2015-09-23T02:47:53.681+0000 2015-09-23T02:47:53.682+0000 I NETWORK  [thread1] trying reconnect to 127.0.0.1:20011 (127.0.0.1) failed
      [js_test:rollback5] 2015-09-23T02:47:54.276+0000 d20010| 2015-09-23T02:47:54.276+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to WIN-MIAKGV0GBFF:20011; HostUnreachable No connection could be made because the target machine actively refused it.
      [js_test:rollback5] 2015-09-23T02:47:54.276+0000 d20010| 2015-09-23T02:47:54.276+0000 I REPL     [ReplicationExecutor] Standing for election
      [js_test:rollback5] 2015-09-23T02:47:54.384+0000 d20012| 2015-09-23T02:47:54.385+0000 I REPL     [ReplicationExecutor] syncing from: WIN-MIAKGV0GBFF:20010
      [js_test:rollback5] 2015-09-23T02:47:54.384+0000 d20010| 2015-09-23T02:47:54.385+0000 I NETWORK  [initandlisten] connection accepted from 10.187.48.125:64063 #4 (4 connections now open)
      [js_test:rollback5] 2015-09-23T02:47:54.385+0000 d20012| 2015-09-23T02:47:54.386+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to WIN-MIAKGV0GBFF:20011; HostUnreachable No connection could be made because the target machine actively refused it.
      [js_test:rollback5] 2015-09-23T02:47:54.385+0000 d20012| 2015-09-23T02:47:54.386+0000 I REPL     [SyncSourceFeedback] setting syncSourceFeedback to WIN-MIAKGV0GBFF:20010
      [js_test:rollback5] 2015-09-23T02:47:54.385+0000 d20010| 2015-09-23T02:47:54.386+0000 I NETWORK  [conn4] end connection 10.187.48.125:64063 (3 connections now open)
      [js_test:rollback5] 2015-09-23T02:47:54.387+0000 d20010| 2015-09-23T02:47:54.386+0000 I NETWORK  [initandlisten] connection accepted from 10.187.48.125:64065 #5 (4 connections now open)
      [js_test:rollback5] 2015-09-23T02:47:54.387+0000 d20010| 2015-09-23T02:47:54.387+0000 I NETWORK  [initandlisten] connection accepted from 10.187.48.125:64066 #6 (5 connections now open)
      [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| 2015-09-23T02:47:54.387+0000 I REPL     [rsBackgroundSync] starting rollback: OplogStartMissing our last op time fetched: (term: -1, timestamp: Sep 23 02:47:15:1). source's GTE: (term: -1, timestamp: Sep 23 02:47:15:1)
      [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| 2015-09-23T02:47:54.388+0000 F REPL     [rsBackgroundSync] need to rollback, but in inconsistent state
      [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| 2015-09-23T02:47:54.388+0000 I -        [rsBackgroundSync] Fatal assertion 28723 UnrecoverableRollbackError need to rollback, but in inconsistent state. minvalid: (term: -1, timestamp: Sep 23 02:47:18:1) our last optime: (term: -1, timestamp: Sep 23 02:47:15:1) @ 18750
      [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| 2015-09-23T02:47:54.388+0000 I -        [rsBackgroundSync]
      [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012|
      [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| ***aborting after fassert() failure
      [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012|
      [js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012|
      

      Assigning to benety.goh per matt.dannenberg's recommendation.

      Benety: are you the appropriate assignee for this ticket? If so, please work on this today, or suggest someone for me to reassign to.

            Assignee:
            benety.goh@mongodb.com Benety Goh
            Reporter:
            rassi J Rassi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: