ISSUE DESCRIPTION AND IMPACT
This issue in MongoDB 4.4.10 to 4.4.13 and 5.0.4 to 5.0.7 may cause replication to stall on secondary replica set members in a sharded cluster handling cross-shard transactions.
The bug is triggered when WiredTiger erroneously returns a write conflict when deciding if an update to a record is allowed. If MongoDB decides to retry the operation that caused the conflict in WiredTiger, it will enter an indefinite retry loop, and oplog application will stall on secondary nodes.
A MongoDB cluster may be affected by this bug if:
- the cluster is sharded
- the application uses cross-shard transactions
- the cluster is using versions 4.4.10 to 4.4.13 or 5.0.4 to 5.0.7 on secondary nodes
If the bug is triggered, the cluster's secondary nodes will experience indefinite growth in replication lag.
REMEDIATION AND WORKAROUNDS
Secondary nodes that have replication stalled may be restarted to resume replication.
This issue is fixed in MongoDB 4.4.14 and 5.0.8.