[SERVER-27317] Inserting a document with a Decimal128 value when featureCompatibilityVersion is 3.2 results in a closed socket Created: 07/Dec/16 Updated: 08/Mar/18 Resolved: 31/Aug/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Write Ops |
| Affects Version/s: | 3.4.0 |
| Fix Version/s: | 3.4.9 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Robert Stam | Assignee: | Henrik Edin |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | In C#:
|
||||||||
| Sprint: | Platforms 2017-09-11 | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
When using the C# driver (and presumably other drivers) inserting a document that contains a Decimal128 value when featureCompatibilityVersion is 3.2 results in the socket being closed. The following appears in the server logs:
I am unable to reproduce this is in the shell, so perhaps this is a result of different code paths in the server (the shell uses OP_COMMAND and drivers do not). In the shell I just get a write error but the socket is not closed.
|
| Comments |
| Comment by Githook User [ 31/Aug/17 ] | ||||||||||
|
Author: {'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com'}Message: Move try statement so it catches exceptions thrown by QueryMessage constructor. | ||||||||||
| Comment by Robert Stam [ 05/Jan/17 ] | ||||||||||
|
Closing the connection is just going to cause repeated support problems. I know that when debugging this report from a .NET driver user it took me a long time to find the problem because at first it didn't occur to me to check the server logs, and then I had to get access to them. It would be much better to return an error message to the user so they know immediately what the problem is without having to dig through server logs. I don't see any reason to close the connection. | ||||||||||
| Comment by David Storch [ 05/Jan/17 ] | ||||||||||
|
Attempting to use 3.4 features without turning on 3.4 compatibility mode will be a common user error, so closing the connection is pretty unfriendly. Given my knowledge of the implementation of BSON validation, and your argument that OP_COMMAND and OP_QUERY behavior can be left inconsistent, I think that not fixing this is an okay outcome. I wouldn't muddy up the validation code too much just to make error behavior friendlier when the operator has not properly completed the 3.2=>3.4 upgrade process, but if we can fix it easily, then we should. On a more philosophical note, I think there is a big difference between a 3.2 node and a 3.4 node in FCV("3.2") mode. For one, a 3.4 secondary node in FCV("3.2") has to be able to sync NumberDecimal and other data that is illegal on 3.2 from an upstream node. Second, although we provide a 3.2-like interface, we made an effort across the board to generate user-friendly error messages when 3.4-specific features are invoked. In contrast, a 3.2 node knows nothing or little about 3.4 features and may behave in a more opaque manner. | ||||||||||
| Comment by Jonathan Reams [ 05/Jan/17 ] | ||||||||||
|
I don't know about that. The more I think about this the more it seems like this works as designed. Inserting a NumberDecimal into a pre-3.4 server would result in your connection getting closed because that would be legitimately malformed BSON - does it make sense for a server that's configured to behave like 3.2 to act differently than 3.2 would? If the purpose of the featureCompatibiltiyVersion is to make a mongod behave like an older mongod, then I think this works as designed. Intentional or not, OP_COMMAND and OP_QUERY have behaved this way since for several major releases and changing it now to make error messages prettier on 3.4 masquerading as 3.2 seems like a breaking change that's hard to justify. Especially considering that in both cases the server logs very clearly why the connection was closed and what you need to do to fix it. | ||||||||||
| Comment by David Storch [ 05/Jan/17 ] | ||||||||||
|
jbreams, two thoughts:
| ||||||||||
| Comment by Jonathan Reams [ 04/Jan/17 ] | ||||||||||
|
david.storch, do we want to keep the behavior that bad BSON closes the connection? This seems really simple to fix - we can just move the initialization of dbMessage/queryMessage (https://github.com/mongodb/mongo/blob/master/src/mongo/db/assemble_response.cpp#L155) to inside the try block, and then any errors parsing the queryMessage will get caught and generate an error response. That seems like a reasonable change, but also a pretty big one. | ||||||||||
| Comment by David Storch [ 07/Dec/16 ] | ||||||||||
|
It looks like the behavior difference between the OP_QUERY and OP_COMMAND protocols is simply due to the fact that OP_COMMAND's call to validateBSON() is below our generic UserException handler, whereas OP_QUERY's is above. For OP_COMMAND, BSON validation happens during construction of the rpc::CommandRequest here: https://github.com/mongodb/mongo/blob/r3.4.0/src/mongo/db/instance.cpp#L260 Exceptions thrown from user assertion statements are caught just below: https://github.com/mongodb/mongo/blob/r3.4.0/src/mongo/db/instance.cpp#L275 In contrast, BSON validation for OP_QUERY commands happens in DbMessage::nextJsObj(), above any logic to handle uasserts: https://github.com/mongodb/mongo/blob/r3.4.0/src/mongo/db/dbmessage.cpp#L129-L133 | ||||||||||
| Comment by David Storch [ 07/Dec/16 ] | ||||||||||
|
As Robert guessed above, this issue appears to affect the OP_QUERY commands protocol, but not the OP_COMMAND protocol. I can reproduce the connection being closed as follows. First, start the shell with a flag specifying to use the OP_QUERY commands protocol:
Then run the following:
I see the shell output indicating that the connection was closed:
|