[SERVER-27317] Inserting a document with a Decimal128 value when featureCompatibilityVersion is 3.2 results in a closed socket Created: 07/Dec/16  Updated: 08/Mar/18  Resolved: 31/Aug/17

Status: Closed
Project: Core Server
Component/s: Write Ops
Affects Version/s: 3.4.0
Fix Version/s: 3.4.9

Type: Bug Priority: Major - P3
Reporter: Robert Stam Assignee: Henrik Edin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to CSHARP-1860 Decimal128 throws System.IO.EndOfStre... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

In C#:

var document = new BsonDocument { { "_id", 1 }, { "x", new BsonDecimal128(1.1M) } };
collection.InsertOne(document);

Sprint: Platforms 2017-09-11
Participants:
Case:

 Description   

When using the C# driver (and presumably other drivers) inserting a document that contains a Decimal128 value when featureCompatibilityVersion is 3.2 results in the socket being closed.

The following appears in the server logs:

2016-12-07T11:52:20.797-0500 I -        [conn2] AssertionException handling request, closing client connection: 22 Client Error: bad object in message: Cannot use decimal BSON type when the featureCompatibilityVersion is 3.2. See http://dochub.mongodb.org/core/3.4-feature-compatibility.
2016-12-07T11:52:20.797-0500 I -        [conn2] end connection 127.0.0.1:53744 (3 connections now open)

I am unable to reproduce this is in the shell, so perhaps this is a result of different code paths in the server (the shell uses OP_COMMAND and drivers do not).

In the shell I just get a write error but the socket is not closed.

> db.test.insert({ _id : 1, x : NumberDecimal("1.1") })
WriteResult({
        "writeError" : {
                "code" : 22,
                "errmsg" : "Cannot use decimal BSON type when the featureCompatibilityVersion is 3.2. See http://dochub.mongodb.org/core/3.4-feature-compatibility."
        }
})
>



 Comments   
Comment by Githook User [ 31/Aug/17 ]

Author:

{'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com'}

Message: SERVER-27317 Fix closed socket when inserting Decimal128.

Move try statement so it catches exceptions thrown by QueryMessage constructor.
Branch: v3.4
https://github.com/mongodb/mongo/commit/f8eda1e61767a0adb362fde8d2447bc80cb9e58e

Comment by Robert Stam [ 05/Jan/17 ]

Closing the connection is just going to cause repeated support problems.

I know that when debugging this report from a .NET driver user it took me a long time to find the problem because at first it didn't occur to me to check the server logs, and then I had to get access to them.

It would be much better to return an error message to the user so they know immediately what the problem is without having to dig through server logs.

I don't see any reason to close the connection.

Comment by David Storch [ 05/Jan/17 ]

Attempting to use 3.4 features without turning on 3.4 compatibility mode will be a common user error, so closing the connection is pretty unfriendly. Given my knowledge of the implementation of BSON validation, and your argument that OP_COMMAND and OP_QUERY behavior can be left inconsistent, I think that not fixing this is an okay outcome. I wouldn't muddy up the validation code too much just to make error behavior friendlier when the operator has not properly completed the 3.2=>3.4 upgrade process, but if we can fix it easily, then we should.

On a more philosophical note, I think there is a big difference between a 3.2 node and a 3.4 node in FCV("3.2") mode. For one, a 3.4 secondary node in FCV("3.2") has to be able to sync NumberDecimal and other data that is illegal on 3.2 from an upstream node. Second, although we provide a 3.2-like interface, we made an effort across the board to generate user-friendly error messages when 3.4-specific features are invoked. In contrast, a 3.2 node knows nothing or little about 3.4 features and may behave in a more opaque manner.

Comment by Jonathan Reams [ 05/Jan/17 ]

I don't know about that. The more I think about this the more it seems like this works as designed. Inserting a NumberDecimal into a pre-3.4 server would result in your connection getting closed because that would be legitimately malformed BSON - does it make sense for a server that's configured to behave like 3.2 to act differently than 3.2 would?

If the purpose of the featureCompatibiltiyVersion is to make a mongod behave like an older mongod, then I think this works as designed. Intentional or not, OP_COMMAND and OP_QUERY have behaved this way since for several major releases and changing it now to make error messages prettier on 3.4 masquerading as 3.2 seems like a breaking change that's hard to justify. Especially considering that in both cases the server logs very clearly why the connection was closed and what you need to do to fix it.

Comment by David Storch [ 05/Jan/17 ]

jbreams, two thoughts:

  1. It seems reasonable to close the connection on bad BSON, but whatever we decide should be consistent across the OP_QUERY and OP_COMMAND protocols. Right now, this behavior applies only to OP_QUERY, which I don't think is by design.
  2. There are two kinds of bad BSON: malformed BSON which suggests that the client is misbehaving or malicious, or BSON that could be considered valid but is not currently valid due to the configuration of the server. The latter flavor of invalidity is new in 3.4, due to NumberDecimal being rejecting when the featureCompatibilityVersion is "3.2". In such cases, I don't think we should ever close the connection, instead we should raise a useful error to the client and keep the connection open.
Comment by Jonathan Reams [ 04/Jan/17 ]

david.storch, do we want to keep the behavior that bad BSON closes the connection? This seems really simple to fix - we can just move the initialization of dbMessage/queryMessage (https://github.com/mongodb/mongo/blob/master/src/mongo/db/assemble_response.cpp#L155) to inside the try block, and then any errors parsing the queryMessage will get caught and generate an error response. That seems like a reasonable change, but also a pretty big one.

Comment by David Storch [ 07/Dec/16 ]

It looks like the behavior difference between the OP_QUERY and OP_COMMAND protocols is simply due to the fact that OP_COMMAND's call to validateBSON() is below our generic UserException handler, whereas OP_QUERY's is above. For OP_COMMAND, BSON validation happens during construction of the rpc::CommandRequest here:

https://github.com/mongodb/mongo/blob/r3.4.0/src/mongo/db/instance.cpp#L260

Exceptions thrown from user assertion statements are caught just below:

https://github.com/mongodb/mongo/blob/r3.4.0/src/mongo/db/instance.cpp#L275

In contrast, BSON validation for OP_QUERY commands happens in DbMessage::nextJsObj(), above any logic to handle uasserts:

https://github.com/mongodb/mongo/blob/r3.4.0/src/mongo/db/dbmessage.cpp#L129-L133

Comment by David Storch [ 07/Dec/16 ]

As Robert guessed above, this issue appears to affect the OP_QUERY commands protocol, but not the OP_COMMAND protocol. I can reproduce the connection being closed as follows. First, start the shell with a flag specifying to use the OP_QUERY commands protocol:

./mongo --rpcProtocols opQueryOnly

Then run the following:

db.adminCommand({setFeatureCompatibilityVersion: "3.2"});
db.c.drop();
db.c.insert({a: NumberDecimal("2.1")});

I see the shell output indicating that the connection was closed:

2016-12-07T17:09:18.622-0500 I NETWORK  [main] DBClientCursor::init call() failed
2016-12-07T17:09:18.622-0500 E QUERY    [main] Error: Error: error doing query: failed :
DBCollection.prototype.insert@src/mongo/shell/collection.js:363:23
@(shell):1:1
2016-12-07T17:09:18.624-0500 I NETWORK  [main] trying reconnect to 127.0.0.1:27017 (127.0.0.1) failed
2016-12-07T17:09:18.625-0500 I NETWORK  [main] reconnect 127.0.0.1:27017 (127.0.0.1) ok

Generated at Thu Feb 08 04:14:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.