[SERVER-31482] Ephemeral for test, capped collection, unindexed sort hang. Created: 09/Oct/17  Updated: 30/Oct/23  Resolved: 10/Nov/17

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 3.6.0-rc4

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Xiangyu Yao (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

db.createCollection("capped", {capped: true, size: 50 * 1024});
db.capped.ensureIndex({a: -1});
for (var idx = 0; idx < 5000; ++idx) {
    db.capped.insert({_id: 1000 - idx, a: idx, b: Random.srand()});
}
 
var cursor = db.capped.find().sort({b: 1}).batchSize(1);
for (var idx = 0; idx < 10; ++idx) {
    cursor.hasNext();
    cursor.next();
}
 
for (var idx = 0; idx < 5000; ++idx) {
    db.capped.insert({_idx: idx, a: -idx, b: Random.srand()});
}

Sprint: Storage 2017-11-13
Participants:
Linked BF Score: 0

 Description   

Ephemeral for test record stores use a mutex to protect concurrent access to their data. Technically, because ephemeral for test is not a document locking record store, this shouldn't be needed for data collections, just the _mdb_catalog. Unfortunately knowing whether a record store is for the _mdb_catalog is a layering violation.

The problem that can arise is that an insert into a capped collection can trigger a delete. A delete on a record goes through a callback, which is ultimately managed by the CursorManager and forwarded to all the individual plans being run. This BF found that the in-memory, non-indexed sorting stage queries the record store for the deleted record (presumably to ensure the query does not return the capped-deleted document in the result set?)

Re-entering the same ephemeral for test record store causes its internal mutex to be double locked. For informational purposes (i.e: not a mandated solution), another example of a callback in this class unlocks the mutex before calling the arbitrary code, and relocks when returning: https://github.com/mongodb/mongo/blob/9e8cce334f74d4e70661bcb3921e069c9a0b248b/src/mongo/db/storage/ephemeral_for_test/ephemeral_for_test_record_store.cpp#L488-L493



 Comments   
Comment by Ian Whalen (Inactive) [ 14/Nov/17 ]

xiangyu.yao please remember to set the fixVersion to the upcoming, unreleased version when you've pushed code and are resolving an issue as Fixed.

Comment by Githook User [ 10/Nov/17 ]

Author:

{'name': 'Xiangyu Yao', 'username': 'xy24', 'email': 'xiangyu.yao@mongodb.com'}

Message: SERVER-31482 Change mutex to recursive mutex for ephemeral_for_test_record_store
Branch: master
https://github.com/mongodb/mongo/commit/ef26311ece7417ef5af2fbd4e22db81fbba43027

Comment by Eric Milkie [ 09/Oct/17 ]

I wonder if using a recursive mutex here would be a viable option.

Generated at Thu Feb 08 04:27:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.