[SERVER-43328] Server can return oversized (invalid) BSON in rare circumstances Created: 13/Sep/19  Updated: 08/Jan/24  Resolved: 21/Apr/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Bug Priority: Major - P3
Reporter: Ian Boros Assignee: Spencer Brody (Inactive)
Resolution: Fixed Votes: 0
Labels: servicearch-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Backwards Compatibility: Fully Compatible
Sprint: Service arch 2020-04-20, Service arch 2020-05-04
Participants:
Linked BF Score: 26

 Description   

The OpMsgBuilder keeps a buffer, and lets callers modify the returned BSONObj by returning an unowned BSONObjBuilder through beginBody() and resumeBody().

It looks like no size checks are ever done on the BSON before the buffer is released to a Message, which ultimately gets returned to the client. Because the BSONObjBuilders returned by beginBody() and resumeBody() are not owned, they don't perform size checks on destruction.

Here's a unit test demonstrating what I mean:
(base revision 6a764b107f77714d501b8d92f720a7853989a525)

diff --git a/src/mongo/rpc/reply_builder_test.cpp b/src/mongo/rpc/reply_builder_test.cpp
index fcdb036c65..5f89f3903e 100644
--- a/src/mongo/rpc/reply_builder_test.cpp
+++ b/src/mongo/rpc/reply_builder_test.cpp
@@ -59,6 +59,39 @@ TEST(OpMsgReplyBuilder, RoundTrip) {
     testRoundTrip<rpc::OpMsgReply>(r, true);
 }
 
+TEST(OpMsgReplyBuilder, BigRoundTrip) {
+    rpc::OpMsgReplyBuilder r;
+
+    std::string bigStr;
+    for (auto i = 0; i < 1024 * 1024 * 16; i++) {
+        bigStr += "_";
+    }
+
+    {
+        // returns an unowned BSONObjBuilder, which will not do size checking when it goes out of
+        // scope.
+        auto builder = r.getBodyBuilder();
+        for (auto i = 0; i < 2; i++) {
+            builder.append("field" + std::to_string(i), bigStr);
+        }
+    }
+
+    std::cout << "appended a lot of stuff to the builder" << std::endl;
+
+    Message replyMsg = r.done();
+    invariant(replyMsg.operation() == dbMsg);
+
+    std::cout << "serialized response to Message" << std::endl;
+
+    std::cout << "attempting to reparse Message as OpMsg" << std::endl;
+
+    // Attempt to re-parse it.
+    rpc::OpMsgReply roundTrippedResponse(&replyMsg);
+
+    // We wont' get this far.
+    MONGO_UNREACHABLE;
+}
+
 template <typename T>
 void testErrors(rpc::ReplyBuilderInterface& replyBuilder);
 

ninja +rpc_test

The test creates an OpMsgReplyBuilder, adds some absurdly large strings to the response BSON, and then attempts to round-trip it. When re-parsing the BSON a BSONObjectTooLarge error gets triggered.

It's also possible to produce this issue "in the wild." For simplicity's sake, in the below patch I have mongod return a hard-coded giant error object, but in general, error messages may contain user data.

mongos will also sometimes take all of the error objects from each shard, and cram them into one object here. This would be a more "realistic" way to encounter the issue in the wild (this is how I ran into the problem in the first place).

diff --git a/geo-repro-standalone.js b/geo-repro-standalone.js
new file mode 100644
index 0000000000..a5d9273771
--- /dev/null
+++ b/geo-repro-standalone.js
@@ -0,0 +1,9 @@
+(function() {
+const coll = db.getSiblingDB("test").coll;
+
+assert.commandWorked(coll.insert({a: {f: 1}}));
+assert.commandWorked(coll.insert({a: {f: 1}}));
+
+const resp = coll.ensureIndex({a: '2d', b: 1});
+print("<ib>: response is " + tojson(resp));
+})();
diff --git a/src/mongo/db/commands/create_indexes.cpp b/src/mongo/db/commands/create_indexes.cpp
index e7505ea41f..3a83ce801d 100644
--- a/src/mongo/db/commands/create_indexes.cpp
+++ b/src/mongo/db/commands/create_indexes.cpp
@@ -813,6 +813,24 @@ public:
         // If we encounter an IndexBuildAlreadyInProgress error for any of the requested index
         // specs, then we will wait for the build(s) to finish before trying again.
         const NamespaceString nss(CommandHelpers::parseNsCollectionRequired(dbname, cmdObj));
+
+
+        if (nss.db() == "test" && nss.coll() == "coll") {
+            result.append("code", ErrorCodes::InternalError);
+
+            // Return a gigantic error message.
+            std::string bigStr;
+            for (auto i = 0; i < 1024 * 1024 * 16; i++) {
+                bigStr += "_";
+            }
+
+            for (auto i = 0; i < 2; i++) {
+                result.append("field" + std::to_string(i), bigStr);
+            }
+
+            return false;
+        }
+
         bool shouldLogMessageOnAlreadyBuildingError = true;
         while (true) {
             try {
diff --git a/src/mongo/rpc/op_msg.cpp b/src/mongo/rpc/op_msg.cpp
index e0a8d60881..325cdf9155 100644
--- a/src/mongo/rpc/op_msg.cpp
+++ b/src/mongo/rpc/op_msg.cpp
@@ -160,6 +160,7 @@ OpMsg OpMsg::parse(const Message& message) try {
             case Section::kBody: {
                 uassert(40430, "Multiple body sections in message", !haveBody);
                 haveBody = true;
+                log() << "<ib>: Attempting to parse response BSON";
                 msg.body = sectionsBuf.read<Validated<BSONObj>>();
                 break;
             }

resmoke --suites=core geo-repro-standalone.js

The shell will hit this massert() while processing the createIndexes command response.



 Comments   
Comment by Githook User [ 20/Apr/20 ]

Author:

{'name': 'Spencer T Brody', 'email': 'spencer@mongodb.com', 'username': 'stbrody'}

Message: SERVER-43328 Enforce BSON size limit in OpMsgBuilder
Branch: master
https://github.com/mongodb/mongo/commit/41bec612516c3258984deaa022453d6721bcd542

Comment by Geert Bosch [ 17/Apr/20 ]

Thanks for clarifying. Indeed we already do this check in the BSONObj constructor, even though ISTR that we'd allow larger responses.

Comment by Spencer Brody (Inactive) [ 16/Apr/20 ]

From geert.bosch on the CR: "I doubt the premise of this ticket, that we want to enforce a limit here. It seems quite fragile, and it is better IMO to send a larger message back than adding all these new places that can throw an exception and end up killing the node. In general our philosophy has been to check on ingest and not verify outgoing BSON."

Since this has been asked a couple of times now by multiple people I want to address it here.

IMO, the main problem at the moment is the mismatch in behavior between BSONObj/BSONObjBuilder and Message/OpMsgBuilder. BSONObj enforces a 16Mb (+16kb) limit, while Message does not. Since these two types represent almost identical concepts, I don't think there's any reason for one of them to allow oversize objects while the other does not. A behavior difference here also breaks the ability to safely round-trip objects between BSONObj and Message.

So I think that either both BSONObj and Message should enforce size constraints, or neither should. While there's certainly a legitimate argument to be made that neither should, I am of the opinion that for now at least both should. For one thing, BSONObj has always worked this way and it's never caused us issues, so why change behavior and risk destabilizing things that may have been depending on this behavior. The other reason I think we should continue enforcing BSON size limits in the server is that otherwise we push those errors up into the drivers. Drivers also aren't expecting to see BSON over 16Mb, so if we start allowing that to come out of the server now, we'll start generating new "oversize bson" errors in the drivers, far removed from where the object actually became oversize. Catching oversize BSON issues early means we're much more likely to be able to determine where exactly the BSON object grew past the limit, making it easier to fix the issue going forward.

Comment by Spencer Brody (Inactive) [ 14/Apr/20 ]

I have a patch put together that changes OpMsgBuilder::finish to return a StatusWith<Message> and error with BSONObjectTooLarge if the Message being built has exceeded BSONObjMaxInternalSize.
The patch is failing, however, in op_msg_integration_test.cpp. It looks like there are tests there to specifically ensure that Message can be larger than BSONObjMaxInternalSize. Anyone know why that check is there? ben.caimano mira.carey@mongodb.com

Generated at Thu Feb 08 05:02:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.