[JAVA-660] Concurrency issue can cause corrupted messages to be sent to the server Created: 02/Oct/12 Updated: 11/Jan/13 Resolved: 22/Oct/12 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | None |
| Affects Version/s: | 2.9.0, 2.9.1 |
| Fix Version/s: | 2.9.2, 2.10.0 |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | Roman Janusz | Assignee: | Jeffrey Yemin |
| Resolution: | Done | Votes: | 0 |
| Labels: | driver | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux 3.0.0-12-server x86_64, 1.6.0_26-b03 |
||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Major Change | ||||||||||||||||
| Description |
|
The 2.9.0 version of the driver introduced a regression which can cause corrupted wire protocol messages to be sent to the server. In practice, the impact of the bug is mitigated by these circumstances:
If both of these occur, it becomes likely that the driver will send corrupted messages to the server, and keep sending them until the application is restarted. The affect of sending corrupted messages to the server is undefined. In some cases, the server will assert and send back an error. In others, it will crash. And in others, it will add corrupt documents to the database. Using --objcheck can mitigate the latter case, but not fully. |
| Comments |
| Comment by Jeffrey Yemin [ 11/Jan/13 ] | ||||||||||||||||||||||
|
An easy way to demonstrate that the bug only manifests when connected to a list of servers (replica set or mongos) If you use a URI with a single host, like "mongodb://localhost", the bug does not manifest. To actually demonstrate the bug, you have to use a multithreaded program.
| ||||||||||||||||||||||
| Comment by auto [ 08/Jan/13 ] | ||||||||||||||||||||||
|
Author: {u'date': u'2012-10-20T06:02:18Z', u'email': u'jeff.yemin@10gen.com', u'name': u'Jeff Yemin'}Message: | ||||||||||||||||||||||
| Comment by auto [ 08/Jan/13 ] | ||||||||||||||||||||||
|
Author: {u'date': u'2012-10-20T06:02:18Z', u'email': u'jeff.yemin@10gen.com', u'name': u'Jeff Yemin'}Message: | ||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 23/Oct/12 ] | ||||||||||||||||||||||
|
FYI, 2.9.2 is released with the fix for this issue. | ||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 23/Oct/12 ] | ||||||||||||||||||||||
|
Confirmed. It's a regression introduced in 2.9.0. | ||||||||||||||||||||||
| Comment by Christopher Price [ 23/Oct/12 ] | ||||||||||||||||||||||
|
Could you please confirm that this issue was introduced in the 2.9 driver and does not exist in 2.8? We are rolling out the 2.8 driver today. Thanks. | ||||||||||||||||||||||
| Comment by auto [ 22/Oct/12 ] | ||||||||||||||||||||||
|
Author: {u'date': u'2012-10-21T22:11:24-07:00', u'email': u'jeff.yemin@10gen.com', u'name': u'Jeff Yemin'}Message: | ||||||||||||||||||||||
| Comment by auto [ 22/Oct/12 ] | ||||||||||||||||||||||
|
Author: {u'date': u'2012-10-19T23:02:18-07:00', u'email': u'jeff.yemin@10gen.com', u'name': u'Jeff Yemin'}Message: | ||||||||||||||||||||||
| Comment by auto [ 20/Oct/12 ] | ||||||||||||||||||||||
|
Author: {u'date': u'2012-10-19T23:02:18-07:00', u'email': u'jeff.yemin@10gen.com', u'name': u'Jeff Yemin'}Message: | ||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 20/Oct/12 ] | ||||||||||||||||||||||
|
I am able to reproduce this bug in this way:
This causes DBTCPConnector.call(...) to call itself recursively, and call OutMessage.doneWithMessage multiple times (from finally clause) as a result. It's always been this way. But the bug was introduced in 2.9.0 when I removed the setting of _buffer to null in OutMessage.doneWithMessage, not realizing that it could be called more than once. This code path will cause buffers to be shared between multiple threads, which will lead to corrupt messages being sent to the server. | ||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 03/Oct/12 ] | ||||||||||||||||||||||
|
This is going to be difficult to find without some way to reproduce the crash (it would be a lot easier if the server reported an error instead of crashed). Is there any way you can correlate the server crash with a specific event in the application log files? |