[SERVER-41994] Shell does not add length to BSON binary subtype 2 Created: 27/Jun/19  Updated: 08/Jan/24  Resolved: 19/Jun/23

Status: Closed
Project: Core Server
Component/s: Shell
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kevin Albertson Assignee: [DO NOT ASSIGN] Backlog - Server Development Platform Team (SDP) (Inactive)
Resolution: Won't Do Votes: 0
Labels: move-stm, sdp-backlog-purge
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
causes SERVER-43946 Converting BinData of type ByteArrayD... Closed
Related
related to SERVER-42504 Allow explicit encryption of BinData ... Backlog
is related to SERVER-41919 Disallow specific types for explicit ... Closed
is related to SERVER-41920 Disallow BinData subtype 2 for implic... Closed
Assigned Teams:
Server Development Platform
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

In the shell insert a BinData with subtype 2:

> db.coll.insert({x: BinData(2, '1234')})

And in the driver you try to read that, you get a parsing exception:

> python -i
>>> from pymongo import MongoClient
>>> c = MongoClient()
>>> c.test.test.find_one()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kevinalbertson/code/mongo-python-driver/pymongo/collection.py", line 1267, in find_one
    for result in cursor.limit(-1):
  File "/Users/kevinalbertson/code/mongo-python-driver/pymongo/cursor.py", line 1200, in next
    if len(self.__data) or self._refresh():
  File "/Users/kevinalbertson/code/mongo-python-driver/pymongo/cursor.py", line 1114, in _refresh
    self.__send_message(q)
  File "/Users/kevinalbertson/code/mongo-python-driver/pymongo/cursor.py", line 988, in __send_message
    codec_options=self.__codec_options)
  File "/Users/kevinalbertson/code/mongo-python-driver/pymongo/cursor.py", line 1077, in _unpack_response
    return response.unpack_response(cursor_id, codec_options)
  File "/Users/kevinalbertson/code/mongo-python-driver/pymongo/message.py", line 1474, in unpack_response
    return bson.decode_all(self.payload_document, codec_options)
bson.errors.InvalidBSON: invalid length or type code

Sprint: Dev Tools 2019-09-09, Dev Tools 2019-09-09, Dev Tools 2019-09-23, Dev Tools 2020-01-13, Dev Tools 2020-01-27, Dev Tools 2020-02-24
Participants:
Linked BF Score: 50

 Description   

Per SPEC-1338, the shell does not serialize BSON binary subtype 2 the same way drivers do, which drivers do not recognize as valid. bsonspec.org states:

\x02 Binary (Old) - This used to be the default subtype, but was deprecated in favor of \x00. Drivers and tools should be sure to handle \x02 appropriately. The structure of the binary data (the byte* array in the binary non-terminal) must be an int32 followed by a (byte*). The int32 is the number of bytes in the repetition.

But the shell does not appear to add the four byte integer length.



 Comments   
Comment by Alex Neben [ 19/Jun/23 ]

This has been identified as work that the SDP team won't do in the near term. Please reopen with a comment if you feel this work should be reprioritized and explain why.

Comment by Steven Vannelli [ 10/May/22 ]

Moving this ticket to the Backlog and removing the "Backlog" fixVersion as per our latest policy for using fixVersions.

Comment by Githook User [ 15/Oct/19 ]

Author:

{'name': 'Gabriel Russell', 'username': 'gabrielrussell', 'email': 'gabriel.russell@mongodb.com'}

Message: Revert "SERVER-41994 correctly create and show type 2 binary elements"

This reverts commit 3f6ba750fc8374968c3c06e31c67ac2839e9a736.
Branch: master
https://github.com/mongodb/mongo/commit/a6e3d841fcb4bc219ad56e3defc898ae18de9dbd

Comment by Githook User [ 10/Sep/19 ]

Author:

{'name': 'Gabriel Russell', 'username': 'gabrielrussell', 'email': 'gabriel.russell@mongodb.com'}

Message: SERVER-41994 correctly create and show type 2 binary elements
Branch: master
https://github.com/mongodb/mongo/commit/3f6ba750fc8374968c3c06e31c67ac2839e9a736

Comment by Bernie Hackett [ 09/Jul/19 ]

The shell would have to encode subtype 2 according to the spec and support decoding data according to the spec, but also support decoding this broken format. Note that no driver ever written can decode what the shell currently generates.

Comment by Mark Benvenuto [ 09/Jul/19 ]

The function I highlighted in my first example is not called. While we can fix this, the shell has been doing the broken behavior for many years at this point. It would be a breaking change for the shell to fix this since the shell will no longer be able to decode data with bindata type 2 that it inserted in previous versions.

CCing gabriel.russell and mira.carey@mongodb.com for their opinion.

Old javascript engines had the same broken behavior:

https://github.com/mongodb/mongo/commit/eb7ba8adc1ea407ebd88a8b829a6adbaf109e681 https://github.com/mongodb/mongo/commit/3184194f9e0eb63fc3f6d73579681c4007022e9b

Example Fix:

diff --git a/src/mongo/scripting/mozjs/valuereader.cpp b/src/mongo/scripting/mozjs/valuereader.cpp
index 0a5a547ccc..264bc3b924 100644
--- a/src/mongo/scripting/mozjs/valuereader.cpp
+++ b/src/mongo/scripting/mozjs/valuereader.cpp
@@ -135,7 +135,7 @@ void ValueReader::fromBSONElement(const BSONElement& elem, const BSONObj& parent
         }
         case mongo::BinData: {
             int len;
-            const char* data = elem.binData(len);
+            const char* data = elem.binDataClean(len);
             std::stringstream ss;
             base64::encode(ss, data, len);
 
diff --git a/src/mongo/scripting/mozjs/valuewriter.cpp b/src/mongo/scripting/mozjs/valuewriter.cpp
index ef3b0b4d42..0fa06c5fe1 100644
--- a/src/mongo/scripting/mozjs/valuewriter.cpp
+++ b/src/mongo/scripting/mozjs/valuewriter.cpp
@@ -360,11 +360,15 @@ void ValueWriter::_writeObject(BSONObjBuilder* b,
 
                 auto binData = base64::decode(*str);
 
-                b->appendBinData(sd,
-                                 binData.size(),
-                                 static_cast<mongo::BinDataType>(
-                                     static_cast<int>(o.getNumber(InternedString::type))),
-                                 binData.c_str());
+                auto binDataType = static_cast<mongo::BinDataType>(
+                    static_cast<int>(o.getNumber(InternedString::type)));
+
+                if (binDataType == ByteArrayDeprecated) {
+                    b->appendBinDataArrayDeprecated(
+                        sd.toString().c_str(), binData.c_str(), binData.size());
+                } else {
+                    b->appendBinData(sd, binData.size(), binDataType, binData.c_str());
+                }
 
                 return;
             }

Comment by Shane Harvey [ 02/Jul/19 ]

Indeed it looks like the shell does not add the length portion of subtype 2. In the shell:

db.coll.insert({_id: NumberInt(1), x: BinData(2, '1234')})

In python:

>>>> import base64
>>>> from bson import raw_bson
>>>> from bson import BSON
>>>> raw = client.test.get_collection("coll", raw_bson.DEFAULT_RAW_BSON_OPTIONS).find_one()
>>>> raw['x']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/shane/git/mongo-python-driver/bson/raw_bson.py", line 103, in __getitem__
    return self.__inflated[item]
  File "/Users/shane/git/mongo-python-driver/bson/raw_bson.py", line 99, in __inflated
    self.__raw, 4, len(self.__raw)-1, self.__codec_options, SON())
  File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 427, in _elements_to_dict
    key, value, position = _element_to_dict(data, position, obj_end, opts)
  File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 407, in _element_to_dict
    element_name)
  File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 266, in _get_binary
    raise InvalidBSON("invalid binary (st 2) - lengths don't match!")
bson.errors.InvalidBSON: invalid binary (st 2) - lengths don't match!
>>>> raw.raw
b'\x19\x00\x00\x00\x10_id\x00\x01\x00\x00\x00\x05x\x00\x03\x00\x00\x00\x02\xd7m\xf8\x00'
>>>> BSON.encode({'_id': 1, 'x': Binary(base64.b64decode('1234'), subtype=2)})
b'\x1d\x00\x00\x00\x10_id\x00\x01\x00\x00\x00\x05x\x00\x07\x00\x00\x00\x02\x03\x00\x00\x00\xd7m\xf8\x00'

Comment by Mark Benvenuto [ 02/Jul/19 ]

SubType 2 was deprecated here -
https://github.com/mongodb/mongo/commit/23882959f5f7a66b39524af8710277ee0dfc29bb

I think we are appending the extra length in the (byte*). See https://github.com/mongodb/mongo/commit/23882959f5f7a66b39524af8710277ee0dfc29bb

Generated at Thu Feb 08 04:59:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.