Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-11889

Indentify ways to encourage races in WiredTiger testing

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Test Format
    • 5
    • StorEng - Refinement Pipeline

      Investigation ticket. I've been thinking recently about how we can better detect races in WiredTiger of late, and during skunkworks put up a PoC branch https://github.com/wiredtiger/wiredtiger/tree/wt-11889-skunkworks-thread-pause that when running test/format randomly targets threads and forces them to sleep for a second. On discussion with marc.butler@mongodb.com and y.ershov@mongodb.com there are better ways to do this, namely by targeting the code locations where races are likely to arise.

      This ticket is to investigate adding delays around the call sites of our atomics and locking code. One possible solution is to add busy loops before and after taking and releasing locks, or updating atomic variables. We'll need some randomness here for both determining when to run the delays and for how long to run the delays. When adding delays around atomics we should prefer a busy loop over a __wt_yield() so we stay in userspace. 

      We also need to think about traceability and making it clear to developers what is taking place. This will help with debugging the issue if we catch a race, and avoid the scenario where we need to rerun the test again and again until we're lucky enough to hit the same delay triggering the race.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            andrew.morton@mongodb.com Andrew Morton
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: