[SERVER-50386] [SBE] Investigate handling of deprecated BSON types Created: 19/Aug/20 Updated: 29/Oct/23 Resolved: 27/Jan/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Eric Cox (Inactive) | Assignee: | Justin Seyster |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | qexec-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Sprint: | Query 2020-12-14, Query 2020-12-28, Query 2021-01-11, Query 2021-01-25, Query 2021-02-08 |
| Participants: |
| Description |
|
When implementing |
| Comments |
| Comment by Githook User [ 27/Jan/21 ] |
|
Author: {'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}Message: |
| Comment by Justin Seyster [ 04/Dec/20 ] |
|
To summarize the issue, BinData with the "ByteArrayDeprecated" has four redundant bytes (to store the length of the data) where the binary data would normally start, and the sbe::value::getBSONBinData() function returns a pointer to those bytes, which may cause problems for clients that are expecting to see the binary data itself. I say these four bytes are "redundant" because the actual length of the BinData is stored in the BSON itself, and there is no BSON parsing code that actually uses the length value stored in a "BytesArrayDeprecated" BinData. I audited the current uses of getBSONBinData() and getBSONBinDataSize(), and I believe they are all correct as written. There may be some callers who will want to ignore the four redundant length bytes, but in most cases, we want to keep them. 1. When used in convertToBsonObj, the redundant bytes are included in the data that gets sent to the BSONObj, which is the desired behavior to ensure that they also get saved in the new BSONObj. 2. In serializeTagVal() (discussed in the description, above), the redundant bits get saved to disk, which is the desired behavior, because we want to preserve their value even if it gets ignored in most cases. 3. writeValueToStream(), which can write a human-readable debug value of a BinData, includes its own special case code to ignore the redundant bytes and only print out the data itself. That's what we want, but perhaps we could move that special case handling into a value.h helper. 4. sbe::value::hashValue() includes the redundant length bytes in its hash. That should be fine, because two deprecated BinData objects that store different values for their redundant length fields should compare as not equal to each other. This behavior may not be optimal, because those bytes eat up half of the maximum eight bytes that get used to compute the hash, but I don't think it's important enough to do anything about it. 5. sbe::value::compareValue() includes the redundant bytes in its comparison, which is weird behavior, but it matches the behavior in woCompare(). We definitely want to match woCompare()'s behavior so that we can stay compatible with the legacy execution engine. 6. builtinBitTestPosition() will treat the four redundant bytes as part of the bitmask, but surprisingly, that's the same behavior as in the classic engine. That's arguably a bug, but I think our best bet is to just keep it as is in both engines: In the BSONElement class, there are actually two accessors for BinData, one that is oblivious to the redundant bytes (i.e., it treats them as if they are part of the actual data) and a "clean" accessor that returns the bytes after the redundant length field (i.e., the actual data) when the subtype is "BytesArrayDeprecated." We could mirror this approach in value.h by adding "clean" versions of sbe::value::getBSONBinData() and sbe::value::getBSONBinDataSize(). Except for writeValueToStream (#4 in the above list), all of the current users of getBSONBinData() will continue to use the existing version. The writeValueToStream function can be adapted to use the new versions so that it has less special-case logic. I'll whip something up and upload a code review for it. Sorry for the disseration! I wanted to get all my thoughts on this down so that I don't forget them. |