[SERVER-9014] Mongod and mongos crash induced by many concurrent invocations of the getnonce command. Created: 18/Mar/13 Updated: 11/Jul/16 Resolved: 20/Mar/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Security |
| Affects Version/s: | 2.4.0 |
| Fix Version/s: | 2.4.2, 2.5.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ryan Nitz | Assignee: | Andy Schwerin |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
OS X - 10.8.2 |
||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
The SecureRandom object held by the singleton instance of CmdGetNonce is accessed in an unsychronized manner by every thread processing a getnonce command on behalf of a connection. SecureRandom is not internally synchronized. It may only be used by one thread at a time. The result is that under sufficient offered load of getnonce commands, eventually two will access the SecureRandom object concurrently, with undefined resultant behavior. One of the resultant behaviors is a segfault, on some systems. (Original description below) I tested this on 2.2.2 and it did not crash mongod. Script is attached. Let me know if you need help building (go app). In a nutshell, this script is opening and closing connections rapidly in a lot of different threads (goroutines).
This is on OSX. I have a standalone go script that caused this (attached).
|
| Comments |
| Comment by Ryan Nitz [ 29/Mar/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Nm... pulling down the unstable now to test. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ryan Nitz [ 29/Mar/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for fixing this! Any idea when the fix will land in the nightly builds? | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by auto [ 28/Mar/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'date': u'2013-03-20T19:49:38Z', u'name': u'Andy Schwerin', u'email': u'schwerin@10gen.com'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 20/Mar/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Should backport to 2.4. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by auto [ 20/Mar/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'date': u'2013-03-20T19:49:38Z', u'name': u'Andy Schwerin', u'email': u'schwerin@10gen.com'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 20/Mar/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Tad nailed the problem. The "go" driver issues a getnonce command on every new connection, which increases the number of simultaneous operations executing CmdGetNonce::run(). Since the commands are singleton objects, they all were using the same instance of SecureRandom to generate nonces. SecureRandom does no locking internally, and so must be locked by users. Patch to follow. For those who prefer repros written in the Mongo shell, here's one:
Run the above code, and eventually a 2.4.0 mongod or mongos will crash. It's not a useful automatic regression test, because you cannot distinguish between "bug fixed" and "haven't waited long enough". | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 19/Mar/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I was able to duplicate the crash on Mac OS/X (10.7.5) after just a few minutes.
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 19/Mar/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Running 2.4.0-rc2-pre: (a long log of connections opening and closing for 1.5 hours)
Can't seem to find the core dump though, but this seems rather reproducible. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ryan Nitz [ 18/Mar/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I ran this on on the nightly and it crashed too. db version v2.5.0-pre-
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ryan Nitz [ 18/Mar/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
I was able to recreate on 2.4.0 rc3. I am trying the nightly now. |