[SERVER-51720] BSONElement::_binDataVector incorrect bindata-type-2 bounds Created: 17/Oct/20  Updated: 29/Oct/23  Resolved: 20/Oct/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: Billy Donahue Assignee: Billy Donahue
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Service arch 2020-11-02
Participants:

 Description   

I could be wrong, but I think BSONElement::_binDataVector uses incorrect bounds for the std::vector<uint8_t> result constructor when retrieving the data for a bindata-type-2 (the ByteArrayDeprecated bindata subtype).

https://github.com/mongodb/mongo/blob/8751e79bc03b5c4c679040440ac76481fd3db356/src/mongo/bson/bsonelement.h#L565-L567

    std::vector<uint8_t> _binDataVector() const {
        if (binDataType() != ByteArrayDeprecated) {
            return std::vector<uint8_t>(reinterpret_cast<const uint8_t*>(value()) + 5,
                                        reinterpret_cast<const uint8_t*>(value()) + 5 +
                                            valuestrsize());
        } else {
            // Skip the extra int32 size
            return std::vector<uint8_t>(reinterpret_cast<const uint8_t*>(value()) + 4,
                                        reinterpret_cast<const uint8_t*>(value()) + 4 +
                                            valuestrsize() - 4);

I believe the +4 offsets in the else clause are incorrect, and it needs to be:

            return std::vector<uint8_t>(reinterpret_cast<const uint8_t*>(value()) + 9,
                                        reinterpret_cast<const uint8_t*>(value()) + 9 +
                                            valuestrsize() - 4);

The difference between ByteArrayDeprecated (binDataType==2) and other binData payloads is that it has a wasted redundant 4-byte length field. So we need to skip over that when returning the binData payload. I think the intend of the +4 was to skip over this length, but it's incorrectly specified here. It should be that we skip 4 beyond what we'd do for other binData types, and they already skip 5.

I'll note also that valuestrsize() might be smaller than 4 if the BSON is malformed, so we should skip 4 bytes, but no more than valuestrsize(), to ensure that the vector arguments are a valid range.

In other words,

    std::vector<uint8_t> _binDataVector() const {
        auto first = reinterpret_cast<const uint8_t*>(value()) + 5;
        auto last = first + valuestrsize();
        if (binDataType() == ByteArrayDeprecated)
            first += std::min<size_t>(4, last - first);  // skip extra int32 size.
        return {first, last};
}



 Comments   
Comment by Githook User [ 20/Oct/20 ]

Author:

{'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}

Message: SERVER-51720 Fix BSONElement::_binDataVector bounds for ByteArrayDeprecated
Branch: master
https://github.com/mongodb/mongo/commit/8e9c7022aa0ad2dc188477f39ddb09487387a812

Comment by Billy Donahue [ 19/Oct/20 ]

Reopening https://mongodbcr.appspot.com/693030002/ since we know more about its potential effects and there's a unit test for the change.

Comment by Billy Donahue [ 19/Oct/20 ]

I am looking at the unittest_gen.cpp and I see that the _binDataVector calls are always inside an if(...checkAndAssertBinDataType(...)) so that's great. There isn't an IDL loophole. The bindata subtype is always checked against the IDL-type expected subtype before the _binDataVector is accessed, so there's no risk of a ByteArrayDeprecated coming in under the radar to fill a differently-subtyped binData field, as I was concerned about. Thanks mark.benvenuto for looking into it.

Comment by Billy Donahue [ 19/Oct/20 ]

The fix here could be to fix IDL or it could be to fix BSONElement. Those would be different team backlogs. I'm putting it in execution team as a BSONElement fix because I think that possibly the first stop-the-bleeding fix we could make (fix included above). Then we need to fix IDL to stop accepting mismatched binData subtypes in its deserializer (pending verification that this is actually happening).

Comment by Billy Donahue [ 19/Oct/20 ]

Discussed offline w/geert.bosch.

I have reviewed the function's callsites. This function has only ever been used by C++ deserializers generated by IDL, and the branch I'm changing is only executed by the IDL "binary" bindata subtype (subtype 2). The IDL compiler's unit tests assert that it rejects the "binary" subtype in IDL files.

I'm looking at idl/idl/cpp_types.py:677 and it looks like IDL may have a weakness that makes the bug important again. The IDL generator will only use bindata subtype when serializing. When deserializing, it doesn't seem to care what subtype the IDL definition actually wanted: any valid bindata input will be accepted and assigned. So conceivably, for example, an input of subtype ByteArrayDeprecated can be accepted into a BinDataGeneral IDL-generated data member. In that Machiavellian perfect storm, we'd have the _binDataVector function producing a misaligned and truncated crazy bindata blob that would be copied into the parsed data structure successfully. This is happening today. (update: no it isn't. see later comments).

We discussed an invariant for the ByteArrayDeprecated branch. An invariant in _binDataVector could expose the server to crashes induced by user data. We can't have that.

We could leave it alone. The ByteArrayDeprecated crazy result is already corrupting data if I'm right about all this. The subtype (known to be 2 in this scenario), is exposed as part of the vector, and the vector contains a truncated payload. This is also unsatisfactory. If anyone is relying on the IDL loophole they would be injecting some nonsense data. I don't think we need to continue supporting that.

We could fix IDL to be strict about binData subtypes when parsing binData. I think that's the best way forward.

Comment by Billy Donahue [ 19/Oct/20 ]

CR https://mongodbcr.appspot.com/693030002/

Generated at Thu Feb 08 05:26:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.