[SERVER-51720] BSONElement::_binDataVector incorrect bindata-type-2 bounds Created: 17/Oct/20 Updated: 29/Oct/23 Resolved: 20/Oct/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Billy Donahue | Assignee: | Billy Donahue |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Sprint: | Service arch 2020-11-02 |
| Participants: |
| Description |
|
I could be wrong, but I think BSONElement::_binDataVector uses incorrect bounds for the std::vector<uint8_t> result constructor when retrieving the data for a bindata-type-2 (the ByteArrayDeprecated bindata subtype).
I believe the +4 offsets in the else clause are incorrect, and it needs to be:
The difference between ByteArrayDeprecated (binDataType==2) and other binData payloads is that it has a wasted redundant 4-byte length field. So we need to skip over that when returning the binData payload. I think the intend of the +4 was to skip over this length, but it's incorrectly specified here. It should be that we skip 4 beyond what we'd do for other binData types, and they already skip 5. I'll note also that valuestrsize() might be smaller than 4 if the BSON is malformed, so we should skip 4 bytes, but no more than valuestrsize(), to ensure that the vector arguments are a valid range. In other words,
|
| Comments |
| Comment by Githook User [ 20/Oct/20 ] |
|
Author: {'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}Message: |
| Comment by Billy Donahue [ 19/Oct/20 ] |
|
Reopening https://mongodbcr.appspot.com/693030002/ since we know more about its potential effects and there's a unit test for the change. |
| Comment by Billy Donahue [ 19/Oct/20 ] |
|
I am looking at the unittest_gen.cpp and I see that the _binDataVector calls are always inside an if(...checkAndAssertBinDataType(...)) so that's great. There isn't an IDL loophole. The bindata subtype is always checked against the IDL-type expected subtype before the _binDataVector is accessed, so there's no risk of a ByteArrayDeprecated coming in under the radar to fill a differently-subtyped binData field, as I was concerned about. Thanks mark.benvenuto for looking into it. |
| Comment by Billy Donahue [ 19/Oct/20 ] |
|
The fix here could be to fix IDL or it could be to fix BSONElement. Those would be different team backlogs. I'm putting it in execution team as a BSONElement fix because I think that possibly the first stop-the-bleeding fix we could make (fix included above). Then we need to fix IDL to stop accepting mismatched binData subtypes in its deserializer (pending verification that this is actually happening). |
| Comment by Billy Donahue [ 19/Oct/20 ] |
|
Discussed offline w/geert.bosch. I have reviewed the function's callsites. This function has only ever been used by C++ deserializers generated by IDL, and the branch I'm changing is only executed by the IDL "binary" bindata subtype (subtype 2). The IDL compiler's unit tests assert that it rejects the "binary" subtype in IDL files. I'm looking at idl/idl/cpp_types.py:677 and it looks like IDL may have a weakness that makes the bug important again. The IDL generator will only use bindata subtype when serializing. When deserializing, it doesn't seem to care what subtype the IDL definition actually wanted: any valid bindata input will be accepted and assigned. So conceivably, for example, an input of subtype ByteArrayDeprecated can be accepted into a BinDataGeneral IDL-generated data member. In that Machiavellian perfect storm, we'd have the _binDataVector function producing a misaligned and truncated crazy bindata blob that would be copied into the parsed data structure successfully. This is happening today. (update: no it isn't. see later comments). We discussed an invariant for the ByteArrayDeprecated branch. An invariant in _binDataVector could expose the server to crashes induced by user data. We can't have that. We could leave it alone. The ByteArrayDeprecated crazy result is already corrupting data if I'm right about all this. The subtype (known to be 2 in this scenario), is exposed as part of the vector, and the vector contains a truncated payload. This is also unsatisfactory. If anyone is relying on the IDL loophole they would be injecting some nonsense data. I don't think we need to continue supporting that. We could fix IDL to be strict about binData subtypes when parsing binData. I think that's the best way forward. |
| Comment by Billy Donahue [ 19/Oct/20 ] |