[SERVER-43909] util/hex.h is non-portable, inefficient, arbitrary, untested, nonorthogonal, and redundant Created: 09/Oct/19  Updated: 29/Oct/23  Resolved: 15/Sep/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.8.0

Type: Bug Priority: Minor - P4
Reporter: Billy Donahue Assignee: Billy Donahue
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
causes SERVER-51719 BSONElement::toString must account fo... Closed
causes SERVER-61943 OID::init slow due to hex decode refa... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Service arch 2020-09-21
Participants:

 Description   

Numerous issues here.

Need a decomposed library, probably just a set of fundamental templates and some sugar to bind them together into a reasonable user-level offering. Currently we can only parse to std::string and StatusWith<...>, which are too high-level and can't be fast.

False advertising: header is missing definition for unconstrained template integerToHex, which is defined in the .cpp file for only a few types. `template <typename T> std::string integerToHex<T>(T);` Calling with an unsupported integer type (like unsigned char or unsigned short) is a link-time error, which is user-unfriendly and undocumented. A header should declare what it provides and only what it provides. After doing some refactoring, I don't see a reason to have a .cpp file here at all.

integerToHex formats ALL the bits of the input value and then truncates the leading zeros. This is wasteful.

integerToHex performs right-shifts on potentially negative integers. This is implementation-defined behavior and not portable.

Arbitrary: A few of the functions here are doing the same basic thing but with slightly different behavior. For example, hexdump and toHexLower both take a ptr+len and hexdump the memory, but hexdump uses uppercase and spaces between the hexpairs, while toHexLower uses lowercase and no spaces. hexdump is verify()-limited to 1M bytes, but toHexLower is unlimited.

isValidHex insists on even digit counts. This is user specific. An unpadded hexdump of an integer value can easily have an odd number of digits (e.g. "0"). Function is unnecessary and called only twice.

Inefficient: uses various mechanisms internally, inconsistently, of various levels of inefficiency. These are local char[] + snprintf, StringBuilder, etc.

There are no unit tests for some of the functions here.
I introduced some errors and didn't see any failures in util_test



 Comments   
Comment by Billy Donahue [ 17/Oct/20 ]

Discussed with Geert.
There are two calls to hexblob::encode that are passed a raw bsondata's data and length.

  • [bson/bsonelement.cpp:876] BSONElement::toString()
  • [db/cst/c_node.cpp:123] printValue<T>, a helper of CNode::toString.

The bug here is that hexblob::encode takes size_t len and the old toHex took int len.

The length of a BSONElement's binData type 2 payload can actually be computed to be negative if the BSON is malformed.

The length of a binData has 4 subtracted if that binData is type 2, because there's assumed to be a 4 byte redundant length in the type2 payload. But the presence of this redundant length is not actually checked, so we can return a negative length.

Diagnostics like toString should not be causing crashes. We can clamp the len to a lowerbound of 0 before calling hexblob::encode

Comment by Githook User [ 15/Sep/20 ]

Author:

{'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}

Message: SERVER-43909 clarify and repair util/hex.h API

Comment by Billy Donahue [ 14/Sep/20 ]

CR
https://mongodbcr.appspot.com/635470002/

Comment by Billy Donahue [ 09/Oct/19 ]

The fromHex are also unsafe and slow.

StatusWith<char> fromHex(const char* c);
StatusWith<char> fromHex(StringData c);

They don't actually check that their input has string length of at least 2, so that's unsafe. They call fromHex(char) to make a StatusWith<char> for each of the 2 chars, and check that result for .isOK(). But then they discard that StatusWith<char>, and call fromHex(char) again, this time calling .getValue().

Really should be addressed.

Generated at Thu Feb 08 05:04:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.