[SERVER-43909] util/hex.h is non-portable, inefficient, arbitrary, untested, nonorthogonal, and redundant Created: 09/Oct/19 Updated: 29/Oct/23 Resolved: 15/Sep/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.8.0 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Billy Donahue | Assignee: | Billy Donahue |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Sprint: | Service arch 2020-09-21 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Numerous issues here. Need a decomposed library, probably just a set of fundamental templates and some sugar to bind them together into a reasonable user-level offering. Currently we can only parse to std::string and StatusWith<...>, which are too high-level and can't be fast. False advertising: header is missing definition for unconstrained template integerToHex, which is defined in the .cpp file for only a few types. `template <typename T> std::string integerToHex<T>(T);` Calling with an unsupported integer type (like unsigned char or unsigned short) is a link-time error, which is user-unfriendly and undocumented. A header should declare what it provides and only what it provides. After doing some refactoring, I don't see a reason to have a .cpp file here at all. integerToHex formats ALL the bits of the input value and then truncates the leading zeros. This is wasteful. integerToHex performs right-shifts on potentially negative integers. This is implementation-defined behavior and not portable. Arbitrary: A few of the functions here are doing the same basic thing but with slightly different behavior. For example, hexdump and toHexLower both take a ptr+len and hexdump the memory, but hexdump uses uppercase and spaces between the hexpairs, while toHexLower uses lowercase and no spaces. hexdump is verify()-limited to 1M bytes, but toHexLower is unlimited. isValidHex insists on even digit counts. This is user specific. An unpadded hexdump of an integer value can easily have an odd number of digits (e.g. "0"). Function is unnecessary and called only twice. Inefficient: uses various mechanisms internally, inconsistently, of various levels of inefficiency. These are local char[] + snprintf, StringBuilder, etc. There are no unit tests for some of the functions here. |
| Comments |
| Comment by Billy Donahue [ 17/Oct/20 ] |
|
Discussed with Geert.
The bug here is that hexblob::encode takes size_t len and the old toHex took int len. The length of a BSONElement's binData type 2 payload can actually be computed to be negative if the BSON is malformed. The length of a binData has 4 subtracted if that binData is type 2, because there's assumed to be a 4 byte redundant length in the type2 payload. But the presence of this redundant length is not actually checked, so we can return a negative length. Diagnostics like toString should not be causing crashes. We can clamp the len to a lowerbound of 0 before calling hexblob::encode |
| Comment by Githook User [ 15/Sep/20 ] |
|
Author: {'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}Message:
|
| Comment by Billy Donahue [ 14/Sep/20 ] |
| Comment by Billy Donahue [ 09/Oct/19 ] |
|
The fromHex are also unsafe and slow. StatusWith<char> fromHex(const char* c); They don't actually check that their input has string length of at least 2, so that's unsafe. They call fromHex(char) to make a StatusWith<char> for each of the 2 chars, and check that result for .isOK(). But then they discard that StatusWith<char>, and call fromHex(char) again, this time calling .getValue(). Really should be addressed. |