Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-92703

Race in KeysCollectionManagerShardingTest GetKeyReadConcernMajorityNotAvailableYetDeadline

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Workload Scheduling
    • Fully Compatible
    • ALL
    • Workload Scheduling 2024-07-22, Workload Scheduling 2024-08-05
    • 200

      The test case

      GetKeyReadConcernMajorityNotAvailableYetDeadline expects that we hit the failpoint "keyRefreshFailWithReadConcernMajorityNotAvailableYet" twice. This is expected because we expect one failure due to the initial refresh on the monitoring thread kicked off by startMonitoring failing, and a second one due to the additional thread that calls getKeysForValidation prompting a wake-up and a second hit of the failpoint.

       

      However, it's possible that the additional thread's getKeysForValidation calls refreshNow and sets refreshRequest _before the initial refresh hits the failpoint. In this case, the _refreshRequest enqueued by the additional thread will be consumed as part of the initial monitoring refresh, so nothing will wake the monitoring thread up to hit the failpoint a second time (the main thread waits to advance the clock until the failpoint is hit twice, since it expects both the initial refresh and the getKeysForValidation-prompted refresh to fail separately; it doesn't correctly consider the interleaving where the getKeysForValidation-prompted refresh is condensed into the additional refresh).

       

      In other words the problematic interleaving is:

      Monitoring thread Unittest-spawned thread Unittest main thread
        Call getKeysForValidation  
        Call refreshNow  
        set _refreshRequest  
      Call doPeriodicRefresh    
      Consume _refreshRequest    
      Hit failpoint, retry    
      Sleep in waitForConditionOrInterrupt waiting for additional request or timeout    
          Wait for failpoint to be hit twice (hang here as has only been hit once)
          Advance clock

       The issue can be reproed by adding a sleep of 500 milliseconds to the monitoring thread before it calls _doPeriodicRefresh in KeysCollectionManager:: Periodicrunner::start

            Assignee:
            george.wangensteen@mongodb.com George Wangensteen
            Reporter:
            george.wangensteen@mongodb.com George Wangensteen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: