[SERVER-20814] 'majority' write stuck waiting for replication after both secondaries fell off the end of the oplog Created: 07/Oct/15  Updated: 08/Oct/15  Resolved: 08/Oct/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File CallStacks.txt    
Issue Links:
Related
is related to SERVER-20817 Timeout during conf_server_write_conc... Closed
Operating System: ALL
Participants:

 Description   

Happened during execution of jstests/sharding/conf_server_write_concern.js. A write, which uses writeConcern: 'majority' got stuck in the awaitReplication call, while the secondaries entered the RECOVERING state.

Below are the relevant lines from the logs and the call stacks are attached.

This is the Evergreen task (it is from a patch build with only test changes).

...
[js_test:conf_server_write_concern] 2015-10-07T19:12:57.365+0000 d20018| 2015-10-07T19:12:57.364+0000 I REPL     [ReplicationExecutor] could not find member to sync from
[js_test:conf_server_write_concern] 2015-10-07T19:12:57.365+0000 d20018| 2015-10-07T19:12:57.365+0000 I REPL     [ReplicationExecutor] Member ip-10-168-64-199:20017 is now in state SECONDARY
 
...
[js_test:conf_server_write_concern] 2015-10-07T19:13:05.470+0000 d20016| 2015-10-07T19:13:05.469+0000 I REPL     [ReplicationExecutor] transition to PRIMARY
...
[js_test:conf_server_write_concern] 2015-10-07T19:13:06.373+0000 d20017| 2015-10-07T19:13:06.372+0000 I REPL     [ReplicationExecutor] Member ip-10-168-64-199:20016 is now in state PRIMARY
[js_test:conf_server_write_concern] 2015-10-07T19:13:06.444+0000 d20016| 2015-10-07T19:13:06.444+0000 I REPL     [rsSync] transition to primary complete; database writes are now permitted
...
[js_test:conf_server_write_concern] 2015-10-07T19:13:07.405+0000 d20018| 2015-10-07T19:13:07.404+0000 I REPL     [ReplicationExecutor] Member ip-10-168-64-199:20016 is now in state PRIMARY
...
[js_test:conf_server_write_concern] 2015-10-07T19:13:08.408+0000 d20018| 2015-10-07T19:13:08.408+0000 I REPL     [ReplicationExecutor] syncing from: ip-10-168-64-199:20016
...
[js_test:conf_server_write_concern] 2015-10-07T19:13:08.450+0000 d20018| 2015-10-07T19:13:08.449+0000 I REPL     [ReplicationExecutor] could not find member to sync from
...
[js_test:conf_server_write_concern] 2015-10-07T19:13:09.481+0000 d20016| 2015-10-07T19:13:09.480+0000 I REPL     [ReplicationExecutor] Member ip-10-168-64-199:20017 is now in state RECOVERING
...
[js_test:conf_server_write_concern] 2015-10-07T19:14:08.686+0000 d20018| 2015-10-07T19:14:08.686+0000 I REPL     [ReplicationExecutor] syncing from: ip-10-168-64-199:20016
...
[js_test:conf_server_write_concern] 2015-10-07T19:14:08.688+0000 d20018| 2015-10-07T19:14:08.687+0000 W REPL     [rsBackgroundSync] we are too stale to use ip-10-168-64-199:20016 as a sync source
...
[js_test:conf_server_write_concern] 2015-10-07T19:14:08.689+0000 d20018| 2015-10-07T19:14:08.688+0000 I REPL     [ReplicationExecutor] could not find member to sync from
[js_test:conf_server_write_concern] 2015-10-07T19:14:08.689+0000 d20018| 2015-10-07T19:14:08.688+0000 E REPL     [rsBackgroundSync] too stale to catch up
[js_test:conf_server_write_concern] 2015-10-07T19:14:08.689+0000 d20018| 2015-10-07T19:14:08.688+0000 I REPL     [rsBackgroundSync] our last optime : (term: 0, timestamp: Oct  7 19:12:55:1)
[js_test:conf_server_write_concern] 2015-10-07T19:14:08.690+0000 d20018| 2015-10-07T19:14:08.688+0000 I REPL     [rsBackgroundSync] oldest available is (term: 1, timestamp: Oct  7 19:13:08:11)
[js_test:conf_server_write_concern] 2015-10-07T19:14:08.690+0000 d20018| 2015-10-07T19:14:08.688+0000 I REPL     [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
[js_test:conf_server_write_concern] 2015-10-07T19:14:08.690+0000 d20018| 2015-10-07T19:14:08.688+0000 I REPL     [ReplicationExecutor] transition to RECOVERING
...
	
[js_test:conf_server_write_concern] 2015-10-07T19:14:09.625+0000 d20017| 2015-10-07T19:14:09.623+0000 I REPL     [ReplicationExecutor] could not find member to sync from
[js_test:conf_server_write_concern] 2015-10-07T19:14:09.625+0000 d20017| 2015-10-07T19:14:09.624+0000 E REPL     [rsBackgroundSync] too stale to catch up
[js_test:conf_server_write_concern] 2015-10-07T19:14:09.625+0000 d20017| 2015-10-07T19:14:09.624+0000 I REPL     [rsBackgroundSync] our last optime : (term: 0, timestamp: Oct  7 19:12:55:1)
[js_test:conf_server_write_concern] 2015-10-07T19:14:09.625+0000 d20017| 2015-10-07T19:14:09.624+0000 I REPL     [rsBackgroundSync] oldest available is (term: 1, timestamp: Oct  7 19:13:08:11)
[js_test:conf_server_write_concern] 2015-10-07T19:14:09.626+0000 d20017| 2015-10-07T19:14:09.624+0000 I REPL     [ReplicationExecutor] Member ip-10-168-64-199:20018 is now in state RECOVERING
[js_test:conf_server_write_concern] 2015-10-07T19:14:09.626+0000 d20017| 2015-10-07T19:14:09.624+0000 I REPL     [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
...
[js_test:conf_server_write_concern] 2015-10-07T19:14:09.618+0000 d20016| 2015-10-07T19:14:09.618+0000 I REPL     [ReplicationExecutor] Member ip-10-168-64-199:20018 is now in state RECOVERING
...



 Comments   
Comment by Kaloian Manassiev [ 08/Oct/15 ]

After discussion with milkie, this is working as designed. When a node is in recovering state it still sends heartbeats so from the point of view of the primary it is up.

Doing an automatic resync is not a good option, because if multiple secondaries start doing it at the same time, it might overload the primary. Also, initial sync starts by wiping out all the data, which is something that shouldn't be undertaken without administrator intervention.

Failing the waitForWriteConcern is not appropriate, because there is always a possibility that the secondaries will go out of the recovering mode and satisfy the write concern.

The way to avoid waiting indefinitely is for all applications to specify timeout when they use a write concern.

Comment by Eric Milkie [ 08/Oct/15 ]

We can't automatically start a resync for a few reasons; the most important one being that some installations might not be able to do a resync because it would take too long or overload the primary.

Comment by Kaloian Manassiev [ 07/Oct/15 ]

If both secondaries of this 3-node replica set have entered the RECOVERING state because they fell off the end of the oplog, shouldn't a full initial sync be attempted automatically or if not, shouldn't the current PRIMARY step down?

Comment by Spencer Brody (Inactive) [ 07/Oct/15 ]

I think my commit https://github.com/mongodb/mongo/commit/60b2e7ffce5f91093d39c6d80701aa3f7c36b5c3 is the culprit here.
The fix is probably to just remove the part of that test that sets the oplog size down to 10mb and let it use the 40mb default.

Generated at Thu Feb 08 03:55:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.