Serialize ValidateCollections hooks across all jobs

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Correctness 2026-03-24
    • 200
    • None
    • None
    • None
    • None
    • None
    • None
    • None

        1. Summary

      Fixes WiredTiger timestamp conflicts in Antithesis testing when multiple ValidateCollections hooks run concurrently on the same MongoDB cluster.

        1. Problem

      In FSM workload tests with `maxTestQueueSize > 1`, concurrent test jobs share the same MongoDB fixture. When tests complete, their ValidateCollections hooks run simultaneously, causing conflicts:

      • Job 1's hook inserts into `test.validate.hook` at timestamp T for internode validation
      • Job 2's hook runs validate commands concurrently, reading at timestamp T
      • *Conflict*: Job 1 cannot commit at T because Job 2 is already reading at T
      • *Error*: `commit timestamp must be after all active read timestamps`
        1. Solution

      Add a global `threading.Lock` to serialize ValidateCollections hooks across all jobs:

      • *replicaset.py*: Define `_GLOBAL_VALIDATION_LOCK` at module level
      • *shardedcluster.py*: Reference the same lock via `_validation_lock` attribute
      • *validate.py*: Acquire fixture's `_validation_lock` before running validation

      Only one ValidateCollections hook can execute at a time, preventing timestamp conflicts while maintaining backward compatibility with fixtures that lack the lock.

        1. Why SERVER-115225 Wasn't Enough

      SERVER-115225 improved timestamp accuracy for a single hook but didn't address concurrent hooks running on the same cluster.

            Assignee:
            Steve McClure
            Reporter:
            Steve McClure
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: