-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: 5.0.0, 6.0.0, 7.0.0, 7.3.0, 8.0.0-rc2
-
Component/s: Sharding
-
Cluster Scalability
-
Fully Compatible
-
ALL
-
v8.0, v7.3, v7.0, v6.0, v5.0
-
(copied to CRM)
-
None
-
None
-
None
-
None
-
None
-
None
-
None
ISSUE DESCRIPTION AND IMPACT
Retryable writes occurring during resharding may be applied more than once if a specific chunk migration follows resharding’s commit. This may manifest as inconsistencies from the perspective of a client application.
This issue can only manifest if ALL of the following are true:
- A retryable write is performed while a reshardCollection operation is in progress.
- A chunk containing documents affected by the retryable write is migrated to a shard which was not the owner of those documents under the original shard key.
- Note that this means a cluster must contain 3 or more shards to be affected.
- The retryable write must actually be retried (for example, due to a network error) after resharding and the subsequent chunk migration commit.
If the issue occurs and a retryable write is performed again, the application-facing impact could take one of the following forms:
Sequence of Operations | Expected Outcome | Actual Outcome |
---|---|---|
Retryable insert {_id: 5} Retryable delete {_id: 5} Duplicate packet causes retry of insert |
No document with _id: 5 is present | Document {_id:5} is present |
Document {_id: 5, counter: 0} is present Retryable update with query {_id: 5} and update of {$inc: {counter: 1} } |
{_id: 5, counter: 1} | {_id: 5, counter: 2} |
Retryable delete {_id: 5} Retryable insert {_id: 5} Duplicate packet causes retry of delete |
Document {_id: 5} is present | No document with _id: 5 is present |
DIAGNOSIS AND REMEDIATION
A script is available at https://github.com/mongodb/support-tools/tree/master/ca-118 that can be used to potentially rule out impact by this issue. This script will inspect the config.changelog to confirm that no chunk migrations occurred on a namespace that has been resharded for a 30 minute period following resharding. Note the following limitations of the script:
- The config.changelog is a capped collection and therefore only has a limited amount of history. The script is unable to provide any insight beyond the limit of the changelog’s history.
- The script can definitively determine that a cluster was not affected (during the history present in config.changelog, see the previous point).
- The script can not definitively determine that a cluster was affected.
If the script reports that a cluster may have been affected or if the script is inconclusive, then you will need to inspect your data to ensure that it is consistent from the perspective of your application. The issue, if it occurs, will only affect retryable writes that operated on the resharded collection. With this in mind, pay specific attention to operations on the collection in question, particularly during the period immediately following resharding’s completion.
If your application meets the criteria above, we recommend that you upgrade to one of the following versions:
Affected Versions | Recommended Upgrade Versions |
---|---|
5.0.0 - 5.0.30 | 5.0.31+ |
6.0.0 - 6.0.16 | 6.0.20+ |
7.0.0 - 7.0.12 | 7.0.16+ |
8.0.0 - 8.0.3 | 8.0.5+ |
WORKAROUNDS
On affected versions, disabling the balancer for a period of time following resharding will also prevent the issue from occurring. The balancer should remain disabled for a length of time sufficient to ensure retryable writes from during the resharding operation will no longer be retried. The recommended minimum duration to disable the balancer following a resharding operation is 30 minutes, matching the default timeout for logical sessions.
Original description
Resharding preserves the full retryability history for any retryable writes which occur during the resharding operation. If a chunk migration follows the resharding, session migration should transfer the relevant write history over to the recipient of the chunk. The way chunk migration checks for whether an oplog is relevant is by filtering on the namespace being migrated.
The problem is that when resharding recipients update their config.transactions table (based on the retryable writes/transactions performed on the donor shard), it creates a noop oplog entry with the namespace set to empty. If the resharding recipient then becomes the donor in the following chunk migration, due to the empty namespace, it will incorrectly conclude that this oplog entry isn't relevant to the chunk actively being migrated. As a result, the noop oplog entry for the already executed retryable write never gets migrated and the retryable write could be executed again after the chunk migration commits.
Adding Max's repro for this issue:
- Start a resharding operation
- Run a retryable $inc update during the resharding operation
- Resharding operation completes
- Run chunk migration
- Retry retryable write from (2) and verify no new oplog entry was generated
- is caused by
-
SERVER-49904 Update config.transactions entry for retryable writes during resharding's oplog application
-
- Closed
-
- is related to
-
SERVER-89452 Avoid adding empty namespaces to txnParticipant's affectedNamespaces
-
- Closed
-
-
SERVER-55384 Move session application for resharding's oplog application into its own class
-
- Closed
-