[SERVER-22016] Fatal assertion 28723 trying to rollback applyOps on a CSRS config server Created: 27/Dec/15 Updated: 06/Apr/23 Resolved: 04/Jan/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.3, 3.3.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ali Hallaji | Assignee: | Andy Schwerin |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-and-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Backport Completed: | |
| Participants: |
| Description |
|
We deploy replica set between multi location(data center), and We use from mongodb 3.2. configsvr logs:
|
| Comments |
| Comment by Githook User [ 29/Jan/16 ] |
|
Author: {u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@mongodb.com'}Message: |
| Comment by Githook User [ 29/Jan/16 ] |
|
Author: {u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}Message: Specifically, support rolling back insert, update and delete applyOps operations (cherry picked from commit fc81fdee1da1d949f80075c8a88998fa3b0c5e78) |
| Comment by Ali Hallaji [ 19/Jan/16 ] |
|
Thank you all friends for your comments. |
| Comment by Daniel Pasette (Inactive) [ 19/Jan/16 ] |
|
Yes, you can and should upgrade to 3.2.1. The problem fixed in this ticket is very specific and does have a workaround/fix if you were to hit it again. The upgrade from 3.2.1 to 3.2.2 is an upgrade to the executable only and should not be a problem. |
| Comment by Ali Hallaji [ 19/Jan/16 ] |
|
Can I install mongodb 3.2.1 for production and wait for new version(3.2.2). |
| Comment by Daniel Pasette (Inactive) [ 17/Jan/16 ] |
|
3.2.2 won't be ready until mid- next month |
| Comment by Ali Hallaji [ 17/Jan/16 ] |
|
Hi Andy, |
| Comment by Githook User [ 04/Jan/16 ] |
|
Author: {u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}Message: Specifically, support rolling back insert, update and delete applyOps operations |
| Comment by Andy Schwerin [ 30/Dec/15 ] |
|
This failure occurred because an election took place in the config server replica set and switched which node was primary while a chunk split was being performed. The final config server writes performed by chunk split, merge and move operations are bundled into a single operation from the perspective of the replication subsystem. That operation is called "applyOps", and is only by sharding and certain backup/restore tools. When the election occurred, the applyOps operation in question had replicated to at most half of the voting nodes in the config server replica set, and so was not fully committed. The new primary had not yet received the operation, and so nodes that had were forced to roll it back. Unfortunately, the replication system in 3.2.0 and 3.2.1 does not know how to roll back these operations. We'll fix this for 3.2.2. In the meantime, you could probably reduce your exposure to this type of error by having fewer voting members in your config server replica set. The number of voters you need should be a function of the number of single-node failures you want to tolerate while still allowing chunk migrations, collection create and other metadata operations to proceed. We typically recommend 3 voting nodes, which allows the config servers to keep accepting writes when up to 1 node fails. Since you can keep accepting metadata reads as long as at least 1 config server is up (voting or non-voting), and metadata reads are all you need to accept document reads and writes, this is probably sufficient for most applications. Also, when this error happens, you can always remove the data files for the config server that entered this fatal state, and let it resynchronize with the other nodes. Since the config databases are typically small, this should not be a slow process. |
| Comment by Ali Hallaji [ 28/Dec/15 ] |
|
Hi Dan, |
| Comment by Daniel Pasette (Inactive) [ 28/Dec/15 ] |
|
Hi Ali, what is the history of your cluster? Are you upgrading from v3.0 or creating a new sharded cluster from scratch here? |