-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Reproducer for SERVER-54019
This adds a sharding jstest that demonstrates the inflated n / nModified on a retryable update-by-_id batch after session migration via moveChunk. The shape mirrors the reproducer attached to the ticket on 2021-01-25; modernised to ES modules and given diagnostic failure messages.
Files
- jstests/sharding/retryable_write_session_migration_inflated_n.js (128 lines)
- src/mongo/tla_plus/Sharding/TxnsMoveRange/RETRYABLE_WRITE_EXTENSION.md (74 lines)
Reproducer flow
- {{ShardingTest(
{shards: 2, mongos: 1}
)}} with a 4-chunk layout; seed {_id: 0, x: 5, counter: 0} on shard0.
- Open a session with retryWrites: false; drive (lsid, txnNumber=0) manually.
- Run update with ordered: false, two update-ones {{[{q: {_id: 0}, u: {$inc:
{counter: 1}
}}, {q: {_id: 10000}, u: {$inc:
{counter: 1}}}]}}; assert {{
{n: 1, nModified: 1}}}.
- moveChunk the
{x: 0..10}
chunk to shard1 with _waitForDelete: true; flush router cache.
- Retry the same (lsid, txnNumber); assert {{
{n: 1, nModified: 1}
}} AND counter === 1.
- On mismatch, fail with a message naming the ticket and the inflated tuple observed vs expected.
TLA+ extension sketch
Sibling-spec design (TxnsMoveRangeWithSessionMigration.tla, not a rewrite). Adds four state variables (shardSessionTable, shardStmtResult, routerStmtSum, sessionMigrationInFlight), three new actions (ShardApplyRetryableStmt, MoveRangeWithSessionMigration, RouterAggregateRetry), two invariants (NoInflatedRetry, SessionTableCopyMonotone). Predicts a ≤6-step TLC counterexample matching the jstest trace; sibling cfg lives in the same directory.
Run
buildscripts/resmoke.py run --suites=sharding \ jstests/sharding/retryable_write_session_migration_inflated_n.js
Related
- SERVER-54019 — retryable updates inflated n/nModified after session migration
- SPM-3190 — closed a related regime; this regime (ordered:false, batch size > 1, multiple update-ones by _id without the shard key) was not closed
- is related to
-
SERVER-54019 Session migration from moveChunk can lead to higher 'n' and 'nModified' for retryable updates by _id
-
- Backlog
-