[JAVA-660] Concurrency issue can cause corrupted messages to be sent to the server Created: 02/Oct/12  Updated: 11/Jan/13  Resolved: 22/Oct/12

Status: Closed
Project: Java Driver
Component/s: None
Affects Version/s: 2.9.0, 2.9.1
Fix Version/s: 2.9.2, 2.10.0

Type: Bug Priority: Blocker - P1
Reporter: Roman Janusz Assignee: Jeffrey Yemin
Resolution: Done Votes: 0
Labels: driver
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux 3.0.0-12-server x86_64, 1.6.0_26-b03


Issue Links:
Depends
Duplicate
is duplicated by SERVER-7163 Replica set crash with segfault Closed
Related
Backwards Compatibility: Major Change

 Description   

The 2.9.0 version of the driver introduced a regression which can cause corrupted wire protocol messages to be sent to the server.

In practice, the impact of the bug is mitigated by these circumstances:

  1. Occurs when connected to a replica set or multiple mongos using the new HA support for mongos (but not a single mongos or a standalone).
  2. It's triggered only if the driver gets an IOException while performing a normal query (not commands) and attempts to retry the query.

If both of these occur, it becomes likely that the driver will send corrupted messages to the server, and keep sending them until the application is restarted.

The affect of sending corrupted messages to the server is undefined. In some cases, the server will assert and send back an error. In others, it will crash. And in others, it will add corrupt documents to the database. Using --objcheck can mitigate the latter case, but not fully.



 Comments   
Comment by Jeffrey Yemin [ 11/Jan/13 ]

An easy way to demonstrate that the bug only manifests when connected to a list of servers (replica set or mongos) If you use a URI with a single host, like "mongodb://localhost", the bug does not manifest. To actually demonstrate the bug, you have to use a multithreaded program.

import com.mongodb.DBCollection;
import com.mongodb.Mongo;
import com.mongodb.MongoURI;
 
import java.net.UnknownHostException;
 
public class JAVA660Test {
    public static void main(String[] args) throws UnknownHostException, InterruptedException {
        Mongo mongo = new Mongo(new MongoURI(args[0]));
        DBCollection coll = mongo.getDB("JAVA660").getCollection("test");
 
        while (true) {
            try {
                // kill mongod at any point to generate exceptions, then restart mongod
                coll.findOne();
            } catch (Exception e) {
                e.printStackTrace();
            }
            Thread.sleep(10);
        }
    }
}

Comment by auto [ 08/Jan/13 ]

Author:

{u'date': u'2012-10-20T06:02:18Z', u'email': u'jeff.yemin@10gen.com', u'name': u'Jeff Yemin'}

Message: JAVA-660: Took call to OutMessage.doneWithMessage out of the recursively-called method, to avoid having it called more than once. Protected OutMessage by setting the buffer to null and checking for null everywhere it's used.
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/0f67736ad72000f3f779c3623852fc8af249de24

Comment by auto [ 08/Jan/13 ]

Author:

{u'date': u'2012-10-20T06:02:18Z', u'email': u'jeff.yemin@10gen.com', u'name': u'Jeff Yemin'}

Message: JAVA-660: Took call to OutMessage.doneWithMessage out of the recursively-called method, to avoid having it called more than once. Protected OutMessage by setting the buffer to null and checking for null everywhere it's used.
Branch: 2.10.x
https://github.com/mongodb/mongo-java-driver/commit/0f67736ad72000f3f779c3623852fc8af249de24

Comment by Jeffrey Yemin [ 23/Oct/12 ]

FYI, 2.9.2 is released with the fix for this issue.

Comment by Jeffrey Yemin [ 23/Oct/12 ]

Confirmed. It's a regression introduced in 2.9.0.

Comment by Christopher Price [ 23/Oct/12 ]

Could you please confirm that this issue was introduced in the 2.9 driver and does not exist in 2.8? We are rolling out the 2.8 driver today. Thanks.

Comment by auto [ 22/Oct/12 ]

Author:

{u'date': u'2012-10-21T22:11:24-07:00', u'email': u'jeff.yemin@10gen.com', u'name': u'Jeff Yemin'}

Message: JAVA-660: Added unit test for OutMessage defensive coding
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/e9c608e061dbe13c69b46577a3127e424b4258b0

Comment by auto [ 22/Oct/12 ]

Author:

{u'date': u'2012-10-19T23:02:18-07:00', u'email': u'jeff.yemin@10gen.com', u'name': u'Jeff Yemin'}

Message: JAVA-660: Took call to OutMessage.doneWithMessage out of the recursively-called method, to avoid having it called more than once. Protected OutMessage by setting the buffer to null and checking for null everywhere it's used.
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/73c2615b042ab5ba2fecb380ff0edd4b866cdfa1

Comment by auto [ 20/Oct/12 ]

Author:

{u'date': u'2012-10-19T23:02:18-07:00', u'email': u'jeff.yemin@10gen.com', u'name': u'Jeff Yemin'}

Message: JAVA-660: Took call to OutMessage.doneWithMessage out of the recursively-called method, to avoid having it called more than once. Protected OutMessage by setting the buffer to null and checking for null everywhere it's used.
Branch: release-2.9.x
https://github.com/mongodb/mongo-java-driver/commit/0f67736ad72000f3f779c3623852fc8af249de24

Comment by Jeffrey Yemin [ 20/Oct/12 ]

I am able to reproduce this bug in this way:

  1. Use a replica set
  2. Use ReadPreference.primary
  3. Kill the primary

This causes DBTCPConnector.call(...) to call itself recursively, and call OutMessage.doneWithMessage multiple times (from finally clause) as a result. It's always been this way. But the bug was introduced in 2.9.0 when I removed the setting of _buffer to null in OutMessage.doneWithMessage, not realizing that it could be called more than once. This code path will cause buffers to be shared between multiple threads, which will lead to corrupt messages being sent to the server.

Comment by Jeffrey Yemin [ 03/Oct/12 ]

This is going to be difficult to find without some way to reproduce the crash (it would be a lot easier if the server reported an error instead of crashed). Is there any way you can correlate the server crash with a specific event in the application log files?

Generated at Thu Feb 08 08:52:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.