Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61309

Fix time-series bucket lock reacquisition logic

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.2.0, 5.0.5, 5.1.1
    • Affects Version/s: None
    • Component/s: None
    • Fully Compatible
    • ALL
    • v5.1, v5.0
    • Execution Team 2021-11-15
    • 159

      Currently we maintain a set of bucket pointers, and we refer to the bucket from a WriteBatch by the pointer. This isn't very robust in the face of pointer re-use by the memory allocator, and given that some situations can result in batches being used after the buckets have been released, we want to ensure we are using the bucket OID as the primary reference rather than the pointer.

      Because bucket reacquisition happens in a number of places, and the non-deterministic nature, it's hard to identify an exact set of circumstances under which we would run into this issue. The linked BFs and HELP tickets should shed some additional light on the possible symptoms. Among them are deadlock, inconsistent state (with similarities to use-after-free bugs), and crashes. The stacktraces will likely show one or more of the following methods:

      mongo::BucketCatalog::_expireIdleBuckets
      mongo::BucketCatalog::BucketAccess::_findOpenBucketThenLock
      mongo::BucketCatalog::BucketAccess::rollover
      mongo::BucketCatalog::_removeBucket
      mongo::BucketCatalog::finish
      mongo::BucketCatalog::_waitToCommitBatch

            Assignee:
            dan.larkin-york@mongodb.com Dan Larkin-York
            Reporter:
            dan.larkin-york@mongodb.com Dan Larkin-York
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: