[SERVER-19934] Ill-timed crash at end of chunk migration can lead to lost writes when using replica sets as config servers Created: 13/Aug/15  Updated: 03/Apr/19  Resolved: 07/Oct/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Andy Schwerin Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test.diff     File test.js    
Issue Links:
Documented
is documented by DOCS-6435 Document how to disable sharding stat... Closed
Related
related to SERVER-20889 Introduce means to disable sharding m... Closed
related to SERVER-21033 Sharding minOpTime info writes should... Closed
related to SERVER-20824 Test for sharding state recovery Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 8 08/28/15, Sharding 9 (09/18/15), Sharding A (10/09/15)
Participants:

 Description   

This bug only affects config server replica set configurations, and cannot occur in 3.0, 2.6 or 2.4 series clusters.

In a sharded cluster with CSRS config servers that is moving some chunk, C from a donor shard to a recipient shard,

If the donor shard replica set primary node (or standalone node) crashes during the chunk migration critical section after writing the chunk metadata changes to the config server,

And some mongos that is not aware of the change to the chunk metadata tries to route a write for the donated chunk to the donor shard,

And the new donor replica set primary node (or restarted standalone node) contacts a lagged CSRS secondary that has stale chunk information,

Then the new donor node will accept the write even though it does not own the chunk, leading to a lost write.

The problem is that the donor replica set does not remember that it is finishing a chunk migration across failovers and restarts, and also does not durably remember the minimum config server optime corresponding to its most recently completed metadata operation.



 Comments   
Comment by Andy Schwerin [ 20/Nov/15 ]

It might suffice to cover the altered backup and restore procedures as per DOCS-6597.

Comment by Spencer Brody (Inactive) [ 22/Oct/15 ]

Needs documentation explaining the sharding state recovery mechanism and how and why one might disable it.

A user summary would allow this ticket to be linked to by the docs to help explain the motivation of what we're fixing with this recovery mechanism

Comment by Githook User [ 07/Oct/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-19934 Sharding config minOpTime recovery

Adds a framework to record incomplete sharding metadata change operations,
which can be recovered at startup or transition to primary.

This version of the framework is blocking in that it cannot be interrupted
until completed.
Branch: master
https://github.com/mongodb/mongo/commit/cc5788013eebbbd71e87581f3bb16532ab463ef0

Comment by Githook User [ 07/Oct/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-19934 Move system namespaces to NamespaceString
Branch: master
https://github.com/mongodb/mongo/commit/6e2fa8f483088bbbbf8dfe4575907a82d7488e08

Comment by Githook User [ 07/Oct/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-19934 waitForWriteConcern should take write concern parameter

That way sub-operations, which need to wait on a specific write concern,
which might be different than the one of the entire operation don't need
to change the OperationContext.
Branch: master
https://github.com/mongodb/mongo/commit/9ba3877df2a0734fbf2148c7c16ca18bdf7d4bfb

Comment by Githook User [ 22/Sep/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-19934 Fix usage of ShardRegistry before sharding is initialized'
Branch: master
https://github.com/mongodb/mongo/commit/e95a0838de334600dca1322914601b4b8404667b

Comment by Githook User [ 22/Sep/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-19934 Begin the implementation of chunk move as a state machine

This change introduces the ChunkMoveOperationState which represents a
state machine that drives the execution of a single chunk move operation.
At this stage, it only implements the critical section phase.

There are no functional changes.
Branch: master
https://github.com/mongodb/mongo/commit/ae5617b20202aea3bd6cde64a1758f2d17e5b93a

Comment by Githook User [ 18/Sep/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-19934 Move moveChunk command to separate source files

No functional changes.
Branch: master
https://github.com/mongodb/mongo/commit/53ad6b9d513542fa344e401a5dd9e6f2a27232ac

Comment by Randolph Tan [ 10/Sep/15 ]

Attached test.js that demonstrates this bug. Also attached test.diff to show were to inject the new failpoint and workaround SERVER-19855 and SERVER-20298.

Comment by Spencer Brody (Inactive) [ 01/Sep/15 ]

Assigning to Randolph to write a jstest repro.

Generated at Thu Feb 08 03:52:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.