[SERVER-61309] Fix time-series bucket lock reacquisition logic Created: 08/Nov/21  Updated: 29/Oct/23  Resolved: 09/Nov/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.2.0, 5.0.5, 5.1.1

Type: Bug Priority: Major - P3
Reporter: Dan Larkin-York Assignee: Dan Larkin-York
Resolution: Fixed Votes: 0
Labels: time-series
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-61171 Can remove a full bucket from the buc... Closed
Problem/Incident
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Sprint: Execution Team 2021-11-15
Participants:
Linked BF Score: 159

 Description   

Currently we maintain a set of bucket pointers, and we refer to the bucket from a WriteBatch by the pointer. This isn't very robust in the face of pointer re-use by the memory allocator, and given that some situations can result in batches being used after the buckets have been released, we want to ensure we are using the bucket OID as the primary reference rather than the pointer.

Because bucket reacquisition happens in a number of places, and the non-deterministic nature, it's hard to identify an exact set of circumstances under which we would run into this issue. The linked BFs and HELP tickets should shed some additional light on the possible symptoms. Among them are deadlock, inconsistent state (with similarities to use-after-free bugs), and crashes. The stacktraces will likely show one or more of the following methods:

mongo::BucketCatalog::_expireIdleBuckets
mongo::BucketCatalog::BucketAccess::_findOpenBucketThenLock
mongo::BucketCatalog::BucketAccess::rollover
mongo::BucketCatalog::_removeBucket
mongo::BucketCatalog::finish
mongo::BucketCatalog::_waitToCommitBatch



 Comments   
Comment by Githook User [ 10/Nov/21 ]

Author:

{'name': 'Dan Larkin-York', 'email': 'dan.larkin-york@mongodb.com', 'username': 'dhly-etc'}

Message: SERVER-61309 Fix time-series bucket lock reacquisition logic
Branch: v5.0
https://github.com/mongodb/mongo/commit/fddc7333aa7a3be423566463ee92b4444f0c3ca1

Comment by Githook User [ 09/Nov/21 ]

Author:

{'name': 'Dan Larkin-York', 'email': 'dan.larkin-york@mongodb.com', 'username': 'dhly-etc'}

Message: SERVER-61309 Fix time-series bucket lock reacquisition logic
Branch: v5.1
https://github.com/mongodb/mongo/commit/b6cbe729c481a9745b5898ced38ea440c7ea7741

Comment by Githook User [ 09/Nov/21 ]

Author:

{'name': 'Dan Larkin-York', 'email': 'dan.larkin-york@mongodb.com', 'username': 'dhly-etc'}

Message: SERVER-61309 Fix time-series bucket lock reacquisition logic
Branch: master
https://github.com/mongodb/mongo/commit/e35972a5a426c05f7af28a91b9ec3625794de60a

Generated at Thu Feb 08 05:52:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.