[SERVER-31462] convertToCapped + renameCollection may cause a segfault Created: 09/Oct/17  Updated: 30/Oct/23  Resolved: 01/Nov/17

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 3.5.13
Fix Version/s: 3.2.19, 3.4.12, 3.6.0-rc3

Type: Bug Priority: Minor - P4
Reporter: Robert Guo (Inactive) Assignee: Xiangyu Yao (Inactive)
Resolution: Fixed Votes: 0
Labels: rbfz
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-31022 renameCollection with dropTarget can ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4, v3.2
Sprint: Storage 2017-11-13
Participants:

 Description   

A renameCollection command renaming a collection to itself following a convertToCapped can cause a segfault. (SERVER-31022 recorded a similar issue where renameCollection by itself could cause a segfault, but that has been fixed and will now return an error to the user)



 Comments   
Comment by Githook User [ 30/Jan/18 ]

Author:

{'email': 'xiangyu.yao@mongodb.com', 'name': 'Xiangyu Yao', 'username': 'xy24'}

Message: SERVER-31462 Register Database::AddCollectionChange in the correct order

The wrong order leads to the rollback of CollectionImpl happening after
the rollback of KVCollectionCatalogEntry. This means the Collection destructor
would call setCappedCallback() on an already destroyed RecordStore.

(cherry picked from commit f9e38f099a892964a09d4a80aafd8edfef21594d)
Branch: v3.4
https://github.com/mongodb/mongo/commit/97fbe83558a3d018b89eaff2d1ff74df1e67782d

Comment by Githook User [ 30/Jan/18 ]

Author:

{'email': 'xiangyu.yao@mongodb.com', 'name': 'Xiangyu Yao', 'username': 'xy24'}

Message: SERVER-31462 Register Database::AddCollectionChange in the correct order

The wrong order leads to the rollback of CollectionImpl happening after
the rollback of KVCollectionCatalogEntry. This means the Collection destructor
would call setCappedCallback() on an already destroyed RecordStore.

(cherry picked from commit f9e38f099a892964a09d4a80aafd8edfef21594d)
Branch: v3.2
https://github.com/mongodb/mongo/commit/bdb080651afe340a10f1fb55dcc89fd767b28003

Comment by William Schultz (Inactive) [ 15/Nov/17 ]

robert.guo Are there things that need to be re-enabled in the rollback fuzzer since this ticket has been resolved?

Comment by Githook User [ 01/Nov/17 ]

Author:

{'email': 'xiangyu.yao@mongodb.com', 'name': 'Xiangyu Yao', 'username': 'xy24'}

Message: SERVER-31462 Register Database::AddCollectionChange in the correct order

The wrong order leads to the rollback of CollectionImpl happening after
the rollback of KVCollectionCatalogEntry. This means the Collection destructor
would call setCappedCallback() on an already destroyed RecordStore.
Branch: master
https://github.com/mongodb/mongo/commit/f9e38f099a892964a09d4a80aafd8edfef21594d

Comment by Xiangyu Yao (Inactive) [ 01/Nov/17 ]

Here is why it destructs the KVCCE and the CollectionImpl in the wrong order:

We register in RecoveryUnit DatabaseImpl::AddCollectionChange before KVCCE::AddCollectionChange, so when the abortion happens (in this case, because we first drop the target which is "test.test" itself so the renaming would abort), KVCCE::AddCollectionChange::rollback() will be triggered ahead of DatabaseImpl::AddCollectionChange::rollback(), thus making the destruction of KVCCE precede that of CollectionImpl.

That only happens to renameCollection() rather than createCollection() because the ordering of registrations in createCollection() is fixed by commit for SERVER-26685

Comment by Eric Milkie [ 27/Oct/17 ]

It appears that the catalog code is assuming that the RecordStore's lifetime will outlast its corresponding Collection's lifetime. Which translates into the KVCCE outlasting the Collection. It sounds like whatever code is doing the dropping of that collection, it is destructing the KVCCE and the Collection in the wrong order; I would look at that code to see what's going on there. In a normal drop collection codepath, this isn't a problem, so it must be related to either the convertToCapped or renameCollection code paths when they drop their source collections, or when renameCollection drops its target collection.

Comment by Xiangyu Yao (Inactive) [ 27/Oct/17 ]

Conclusion:
The problem is that there are two pointers, a smart pointer and one normal pointer, owned by different objects but pointing to
the same resource. When we drop a collection and delete relevant objects, the resource gets freed because of the smart pointer mechanism.
But then the normal pointer tries to access the resource which is gone, thus causing this SegFault.

Detail:
KVCollectionCatalogEntry owns a smart pointer to RecordStore:
In kv_collection_catalog_entry.h:

std::unique_ptr<RecordStore> _recordStore;  // owned

And there is a "stupid" pointer in CollectionImpl:
In collection_impl.h:

    RecordStore* const _recordStore;

They refer to the same thing!

Here is what happened when I reproduced the bug: The system dropped a collection called "test.system.drop.xxxxxxxx.test". And this made KVCollectionCatalogEntry get destructed, so the smart pointer freed the resource.
And later CollectionImpl got destructed:

CollectionImpl::~CollectionImpl() {
    verify(ok());
    if (isCapped()) {
        _recordStore->setCappedCallback(nullptr);  ----> Line A
        _cappedNotifier->kill();
    }

Line A: This would try to dereference the stupid pointer _recordStore which points to garbage.

Solutions I can think of:

  1. Make _recordStore a shared pointer instead
  2. Delete Line A
Generated at Thu Feb 08 04:27:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.