Summary
The specification for OP_MSG does not define how a "cstring" is byte-encoded in the serialized message; it simply says "NULL terminated string". BSON defines it this way:
Zero or more modified UTF-8 encoded characters followed by the null byte. The (byte*) MUST NOT contain unsigned_byte(0), hence it is not full UTF-8.
"NULL" is also ambiguous in UTF-8; does that mean the "null byte" 0x00? Or does it mean the "null character", U+0000 which has a different encoding in Java's modified UTF-8?
Motivation
Who is the affected end user?
In Atlas Search, we have two separate implementations of Wire Protocol "servers"; we use this and the documentation (See DOCSP-44199) to ensure compliance.
How does this affect the end user?
I was confused.
How likely is it that this problem or use case will occur?
An issue is somewhat likely. Encodings should generally be included when the domain types "string" and "byte" are interwoven. (For example, see python 2 to python 3.) However, the null problem is more academic / for completeness.
If the problem does occur, what are the consequences and how severe are they?
There are a number of possible consequences, including crashes and invalid/incorrect data due to ambiguous serialization / deserialization.
Is this issue urgent?
No
Is this ticket required by a downstream team?
Yes - Atlas Search
Is this ticket only for tests?
Neither - it's for specs.
Acceptance Criteria
The spec is updated to fully define the method of encoding.