[SERVER-23925] Potential use-after-free when WSM is refetched after snapshot id changes in UpdateStage and DeleteStage Created: 26/Apr/16  Updated: 02/Dec/16  Resolved: 02/May/16

Status: Closed
Project: Core Server
Component/s: Querying, Write Ops
Affects Version/s: 3.3.4
Fix Version/s: 3.3.6

Type: Bug Priority: Major - P3
Reporter: Igor Canadi Assignee: Max Hirschhorn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-24001 Add a RocksDB+ASan build variant to E... Closed
is related to SERVER-22178 Should retry a sorted findAndModify i... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Query 14 (05/13/16)
Participants:
Linked BF Score: 0

 Description   

MongoRocks tests have been failing for a while with weird errors. I just ran an ASAN test and I think I discovered the root cause. I don't think it's specific to RocksDB storage engine, so I'm surprised that it's not failing in the WiredTiger case.

Here's the stack trace of use-after-free: https://gist.github.com/igorcanadi/76235761eb90f1e1a23b2f35b1627b90

Bear with me:

  1. Let's start in update.cpp (https://github.com/mongodb/mongo/blob/r3.3.5/src/mongo/db/exec/update.cpp#L867-L868). A call to ensureStillMatches() causes member to point to the invalid buffer.
  2. ...Because ensureStillMatches() calls into WorkingSetCommon::fetch() with unique_ptr<Cursor> that gets destroyed at the end of the function (https://github.com/mongodb/mongo/blob/r3.3.5/src/mongo/db/exec/write_stage_common.cpp#L53)
  3. WorkingSetCommon::fetch() sets member->obj to point to the cursor's buffer with function releaseToBson(), which doesn't copy data out of an unowned buffer (https://github.com/mongodb/mongo/blob/r3.3.5/src/mongo/db/exec/working_set_common.cpp#L107)

I believe that the offending commit may be https://github.com/mongodb/mongo/commit/178e241b81882f85a58deda960d80607a77e1c3a (based on 'git blame', I haven't actually bisected it)

Does this make sense?



 Comments   
Comment by Githook User [ 02/May/16 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-23925 Make BSONObj underlying WSM outlive cursor on refetch.
Branch: master
https://github.com/mongodb/mongo/commit/040f355ad944f0ed19f024413ac7a313c1e95bde

Comment by Max Hirschhorn [ 26/Apr/16 ]

Thanks for tracking this one down igor!

Using a patch similar to the one I used while doing SERVER-16444 to aggressively free the memory underlying a cursor, I can reproduce this issue when running the "findAndModify" FSM workloads.

diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
index 636f2d8..61c0025 100644
--- a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
+++ b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
@@ -440,6 +440,8 @@ public:
     }
 
     boost::optional<Record> next() final {
+        _prev.reset();
+
         if (_eof)
             return {};
 
@@ -509,12 +511,17 @@ public:
 
         WT_ITEM value;
         invariantWTOK(c->get_value(c, &value));
+        _prev = unique_ptr<char[]>(new char[value.size]);
+        std::memcpy(_prev.get(), value.data, value.size);
+        auto data = RecordData(_prev.get(), value.size);
 
         _lastReturnedId = id;
-        return {{id, {static_cast<const char*>(value.data), static_cast<int>(value.size)}}};
+        return {{id, std::move(data)}};
     }
 
     boost::optional<Record> seekExact(const RecordId& id) final {
+        _prev.reset();
+
         _skipNextAdvance = false;
         WT_CURSOR* c = _cursor->get();
         c->set_key(c, _makeKey(id));
@@ -528,13 +535,18 @@ public:
 
         WT_ITEM value;
         invariantWTOK(c->get_value(c, &value));
+        _prev = unique_ptr<char[]>(new char[value.size]);
+        std::memcpy(_prev.get(), value.data, value.size);
+        auto data = RecordData(_prev.get(), value.size);
 
         _lastReturnedId = id;
         _eof = false;
-        return {{id, {static_cast<const char*>(value.data), static_cast<int>(value.size)}}};
+        return {{id, std::move(data)}};
     }
 
     void save() final {
+        _prev.reset();
+
         try {
             if (_cursor)
                 _cursor->reset();
@@ -550,6 +562,8 @@ public:
     }
 
     bool restore() final {
+        _prev.reset();
+
         if (!_cursor)
             _cursor.emplace(_rs.getURI(), _rs.tableId(), true, _txn);
 
@@ -598,11 +612,15 @@ public:
     }
 
     void detachFromOperationContext() final {
+        _prev.reset();
+
         _txn = nullptr;
         _cursor = boost::none;
     }
 
     void reattachToOperationContext(OperationContext* txn) final {
+        _prev.reset();
+
         _txn = txn;
         // _cursor recreated in restore() to avoid risk of WT_ROLLBACK issues.
     }
@@ -634,6 +652,7 @@ private:
     bool _eof = false;
     RecordId _lastReturnedId;  // If null, need to seek to first/last record.
     const RecordId _readUntilForOplog;
+    std::unique_ptr<char[]> _prev;
 };
 
 StatusWith<std::string> WiredTigerRecordStore::parseOptionsField(const BSONObj options) {

The changes from 178e241 inadvertently made it so that WorkingSetMember::makeObjOwnedIfNeeded() is called after cursor (i.e. the SeekableRecordCursor) was destructed. This leads to the heap-use-after-free that Igor described.

Note: A similar issue affects the DeleteStage in addition to the UpdateStage.


After applying the following patch, I can no longer reproduce the issue.

diff --git a/src/mongo/db/exec/write_stage_common.cpp b/src/mongo/db/exec/write_stage_common.cpp
index 662373f..52bf172 100644
--- a/src/mongo/db/exec/write_stage_common.cpp
+++ b/src/mongo/db/exec/write_stage_common.cpp
@@ -60,6 +60,10 @@ bool ensureStillMatches(const Collection* collection,
             // No longer matches.
             return false;
         }
+
+        // Ensure that the BSONObj underlying the WorkingSetMember is owned because the cursor's
+        // destructor is allowed to free the memory.
+        member->makeObjOwnedIfNeeded();
     }
     return true;
 }

Comment by Andrew Morrow (Inactive) [ 26/Apr/16 ]

Thanks, yes that is what I was after.

Comment by Igor Canadi [ 26/Apr/16 ]

That's awesome, thanks!

I do have an error report, I have a link in the original comment: https://gist.github.com/igorcanadi/76235761eb90f1e1a23b2f35b1627b90. Is this what you're asking for?

Comment by Andrew Morrow (Inactive) [ 26/Apr/16 ]

igor - Per our email discussion, I had actually just launched an ASAN/RocksDB test run when your email came in. I'll let you know if it shows results consistent with what you are seeing. Do you have an ASAN error report you can paste in?

Comment by Igor Canadi [ 26/Apr/16 ]

MongoRocks tests started failing between these two commits (according to evergreen):

The commit that I blame in the original ticket (https://github.com/mongodb/mongo/commit/178e241b81882f85a58deda960d80607a77e1c3a) is actually between the two commits!

Generated at Thu Feb 08 04:04:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.