[SERVER-20579] arbiters should not start background sync and applier threads Created: 23/Sep/15  Updated: 25/Jan/17  Resolved: 25/Sep/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.1.9

Type: Bug Priority: Major - P3
Reporter: J Rassi Assignee: Benety Goh
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-19956 arbiter should use commit level for i... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: RPL A (10/09/15)
Participants:

 Description   

ReplicationCoordinatorImpl should not be invoking ReplicationCoordinatorExternalState::startThreads() when the node is configured as an arbiter.

previous title:

rollback_*.js in replicasets_WT fail on Windows (rollback5.js, rollback_empty_ns.js, rollback_empty_o.js, rollback_fake_cmd.js, rollback_cmd_unrollbackable.js, rollback_empty_o2.js, rollback_collMod_fatal.js, rollback_different_h.js, rollback_dropdb.js)

A number of rollback_*.js tests in replicasets_WT are tripping an UnrecoverableRollbackError fassert on a couple of Windows variants (Windows Vista, Windows 2008R2). The introduction of this failure seems recent.

7 failures observed on 3be90504 alone:

Excerpt:

[js_test:rollback5] 2015-09-23T02:47:53.359+0000 d20012| 2015-09-23T02:47:53.360+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to WIN-MIAKGV0GBFF:20011; HostUnreachable No connection could be made because the target machine actively refused it.
[js_test:rollback5] 2015-09-23T02:47:53.384+0000 d20012| 2015-09-23T02:47:53.385+0000 I REPL     [ReplicationExecutor] Member WIN-MIAKGV0GBFF:20010 is now in state SECONDARY
[js_test:rollback5] 2015-09-23T02:47:53.480+0000 2015-09-23T02:47:53.480+0000 W NETWORK  [thread1] Failed to connect to 127.0.0.1:20011, reason: errno:10061 No connection could be made because the target machine actively refused it.
[js_test:rollback5] 2015-09-23T02:47:53.480+0000 2015-09-23T02:47:53.480+0000 I NETWORK  [thread1] reconnect 127.0.0.1:20011 (127.0.0.1) failed failed
[js_test:rollback5] 2015-09-23T02:47:53.480+0000 ReplSetTest Could not call ismaster on node 1: Error: socket exception [CONNECT_ERROR] for couldn't connect to server 127.0.0.1:20011, connection attempt failed
[js_test:rollback5] 2015-09-23T02:47:53.681+0000 2015-09-23T02:47:53.682+0000 I NETWORK  [thread1] trying reconnect to 127.0.0.1:20011 (127.0.0.1) failed
[js_test:rollback5] 2015-09-23T02:47:54.276+0000 d20010| 2015-09-23T02:47:54.276+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to WIN-MIAKGV0GBFF:20011; HostUnreachable No connection could be made because the target machine actively refused it.
[js_test:rollback5] 2015-09-23T02:47:54.276+0000 d20010| 2015-09-23T02:47:54.276+0000 I REPL     [ReplicationExecutor] Standing for election
[js_test:rollback5] 2015-09-23T02:47:54.384+0000 d20012| 2015-09-23T02:47:54.385+0000 I REPL     [ReplicationExecutor] syncing from: WIN-MIAKGV0GBFF:20010
[js_test:rollback5] 2015-09-23T02:47:54.384+0000 d20010| 2015-09-23T02:47:54.385+0000 I NETWORK  [initandlisten] connection accepted from 10.187.48.125:64063 #4 (4 connections now open)
[js_test:rollback5] 2015-09-23T02:47:54.385+0000 d20012| 2015-09-23T02:47:54.386+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to WIN-MIAKGV0GBFF:20011; HostUnreachable No connection could be made because the target machine actively refused it.
[js_test:rollback5] 2015-09-23T02:47:54.385+0000 d20012| 2015-09-23T02:47:54.386+0000 I REPL     [SyncSourceFeedback] setting syncSourceFeedback to WIN-MIAKGV0GBFF:20010
[js_test:rollback5] 2015-09-23T02:47:54.385+0000 d20010| 2015-09-23T02:47:54.386+0000 I NETWORK  [conn4] end connection 10.187.48.125:64063 (3 connections now open)
[js_test:rollback5] 2015-09-23T02:47:54.387+0000 d20010| 2015-09-23T02:47:54.386+0000 I NETWORK  [initandlisten] connection accepted from 10.187.48.125:64065 #5 (4 connections now open)
[js_test:rollback5] 2015-09-23T02:47:54.387+0000 d20010| 2015-09-23T02:47:54.387+0000 I NETWORK  [initandlisten] connection accepted from 10.187.48.125:64066 #6 (5 connections now open)
[js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| 2015-09-23T02:47:54.387+0000 I REPL     [rsBackgroundSync] starting rollback: OplogStartMissing our last op time fetched: (term: -1, timestamp: Sep 23 02:47:15:1). source's GTE: (term: -1, timestamp: Sep 23 02:47:15:1)
[js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| 2015-09-23T02:47:54.388+0000 F REPL     [rsBackgroundSync] need to rollback, but in inconsistent state
[js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| 2015-09-23T02:47:54.388+0000 I -        [rsBackgroundSync] Fatal assertion 28723 UnrecoverableRollbackError need to rollback, but in inconsistent state. minvalid: (term: -1, timestamp: Sep 23 02:47:18:1) our last optime: (term: -1, timestamp: Sep 23 02:47:15:1) @ 18750
[js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| 2015-09-23T02:47:54.388+0000 I -        [rsBackgroundSync]
[js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012|
[js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012| ***aborting after fassert() failure
[js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012|
[js_test:rollback5] 2015-09-23T02:47:54.388+0000 d20012|

Assigning to benety.goh per matt.dannenberg's recommendation.

Benety: are you the appropriate assignee for this ticket? If so, please work on this today, or suggest someone for me to reassign to.



 Comments   
Comment by Githook User [ 25/Sep/15 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-20579 Re-enable nine rollback tests in replicasets/

This reverts commit e831e7a6abb32f04361c6aa5232ef0c806d35349 and 063d9b2dcc46a7b42ceea7d9596b96e5e7080225.
Branch: master
https://github.com/mongodb/mongo/commit/6e2366ff2a507a3bb53b6eda56061dd4b36033bb

Comment by Githook User [ 25/Sep/15 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-20579 do not start threads in replication coordinator external state if arbiter
Branch: master
https://github.com/mongodb/mongo/commit/6ffc7d6b46bb5a54a08d017e6b9235bb2dcecebb

Comment by Githook User [ 25/Sep/15 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-20579 test function waitForStartUpComplete() should block until _finishLocalLocalConfig() is complete
Branch: master
https://github.com/mongodb/mongo/commit/922266d2d86f7dc379f1adad619f2d223b78e938

Comment by Githook User [ 23/Sep/15 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-20579 Temporarily disable three rollback tests in replicasets/
Branch: master
https://github.com/mongodb/mongo/commit/e831e7a6abb32f04361c6aa5232ef0c806d35349

Comment by Githook User [ 23/Sep/15 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-20579 Temporarily disable six rollback tests in replicasets/
Branch: master
https://github.com/mongodb/mongo/commit/063d9b2dcc46a7b42ceea7d9596b96e5e7080225

Generated at Thu Feb 08 03:54:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.