Add jstest reproducer for SERVER-54019 retryable-write inflated n/nModified after session migration

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Reproducer for SERVER-54019

      This adds a sharding jstest that demonstrates the inflated n / nModified on a retryable update-by-_id batch after session migration via moveChunk. The shape mirrors the reproducer attached to the ticket on 2021-01-25; modernised to ES modules and given diagnostic failure messages.

      Files

      • jstests/sharding/retryable_write_session_migration_inflated_n.js (128 lines)
      • src/mongo/tla_plus/Sharding/TxnsMoveRange/RETRYABLE_WRITE_EXTENSION.md (74 lines)

      Reproducer flow

      1. {{ShardingTest( {shards: 2, mongos: 1}

        )}} with a 4-chunk layout; seed {_id: 0, x: 5, counter: 0} on shard0.

      2. Open a session with retryWrites: false; drive (lsid, txnNumber=0) manually.
      3. Run update with ordered: false, two update-ones {{[{q: {_id: 0}, u: {$inc: {counter: 1}

        }}, {q: {_id: 10000}, u: {$inc:

        {counter: 1}

        }}]}}; assert {{

        {n: 1, nModified: 1}

        }}.

      4. moveChunk the {x: 0..10}

        chunk to shard1 with _waitForDelete: true; flush router cache.

      5. Retry the same (lsid, txnNumber); assert {{ {n: 1, nModified: 1}

        }} AND counter === 1.

      6. On mismatch, fail with a message naming the ticket and the inflated tuple observed vs expected.

      TLA+ extension sketch

      Sibling-spec design (TxnsMoveRangeWithSessionMigration.tla, not a rewrite). Adds four state variables (shardSessionTable, shardStmtResult, routerStmtSum, sessionMigrationInFlight), three new actions (ShardApplyRetryableStmt, MoveRangeWithSessionMigration, RouterAggregateRetry), two invariants (NoInflatedRetry, SessionTableCopyMonotone). Predicts a ≤6-step TLC counterexample matching the jstest trace; sibling cfg lives in the same directory.

      Run

      buildscripts/resmoke.py run --suites=sharding \
        jstests/sharding/retryable_write_session_migration_inflated_n.js
      

      Related

      • SERVER-54019 — retryable updates inflated n/nModified after session migration
      • SPM-3190 — closed a related regime; this regime (ordered:false, batch size > 1, multiple update-ones by _id without the shard key) was not closed

            Assignee:
            Unassigned
            Reporter:
            Mehar Grewal
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: