[SERVER-9014] Mongod and mongos crash induced by many concurrent invocations of the getnonce command. Created: 18/Mar/13  Updated: 11/Jul/16  Resolved: 20/Mar/13

Status: Closed
Project: Core Server
Component/s: Security
Affects Version/s: 2.4.0
Fix Version/s: 2.4.2, 2.5.0

Type: Bug Priority: Major - P3
Reporter: Ryan Nitz Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OS X - 10.8.2


Attachments: File go-memory-test.tar.gz     File test.go    
Issue Links:
Duplicate
is duplicated by SERVER-10331 Mongodb SEGFAULTS with a null pointer Closed
is duplicated by SERVER-9329 Invalid access at address - Segmentat... Closed
Related
related to SERVER-9362 Mongod crashes at first getnonce call... Closed
Operating System: ALL
Participants:

 Description   

The SecureRandom object held by the singleton instance of CmdGetNonce is accessed in an unsychronized manner by every thread processing a getnonce command on behalf of a connection. SecureRandom is not internally synchronized. It may only be used by one thread at a time. The result is that under sufficient offered load of getnonce commands, eventually two will access the SecureRandom object concurrently, with undefined resultant behavior. One of the resultant behaviors is a segfault, on some systems.

(Original description below)

I tested this on 2.2.2 and it did not crash mongod.

Script is attached. Let me know if you need help building (go app).

In a nutshell, this script is opening and closing connections rapidly in a lot of different threads (goroutines).

Mon Mar 18 11:28:50.628 [initandlisten] connection accepted from 127.0.0.1:60649 #1428 (100 connections now open)
Mon Mar 18 11:28:50.637 Invalid access at address: 0x101fffff8 from thread: conn1417
Mon Mar 18 11:28:50.637
Mon Mar 18 11:28:50.637 Invalid access at address: 0x101fffff8 from thread: conn1418
Mon Mar 18 11:28:50.637 Mon Mar 18 11:28:50.637 Invalid access at address: 0x101fffff8 from thread: conn1419
Mon Mar 18 11:28:50.637 Mon Mar 18 11:28:50.637
Invalid access at address: 0x101fffff8 from thread: conn1345
Got signal: 11 (Segmentation fault: 11).
 
Invalid access at address: 0x101fffff8 from thread: conn1421
Invalid access at address: 0x101fffff8 from thread: conn1422
Mon Mar 18 11:28:50.637
 
Mon Mar 18 11:28:50.637
 
Got signal: 11 (Segmentation fault: 11).
Mon Mar 18 11:28:50.637 Got signal: 11 (Segmentation fault: 11).
Mon Mar 18 11:28:50.637
Mon Mar 18 11:28:50.637 Got signal: 11 (Segmentation fault: 11).
 
Got signal: 11 (Segmentation fault: 11).
Got signal: 11 (Segmentation fault: 11).

This is on OSX.

I have a standalone go script that caused this (attached).

Mon Mar 18 11:28:50.638 Mon Mar 18 11:28:50.638 Invalid access at address: 0x101fffff8 from thread: conn1423
Invalid access at address: 0x101fffff8 from thread: conn1424
Mon Mar 18 11:28:50.638 Mon Mar 18 11:28:50.638
Mon Mar 18 11:28:50.638
Invalid access at address: 0x101fffff8 from thread: conn1425
Invalid access at address: 0x101fffff8 from thread: conn1426
Mon Mar 18 11:28:50.638 Mon Mar 18 11:28:50.638 Invalid access at address: 0x101fffff8 from thread: conn1427
Mon Mar 18 11:28:50.638
 
Invalid access at address: 0x101fffff8 from thread: conn1428
Got signal: 11 (Segmentation fault: 11).
 
Got signal: 11 (Segmentation fault: 11).
 
Mon Mar 18 11:28:50.638 Mon Mar 18 11:28:50.638
Mon Mar 18 11:28:50.638
Got signal: 11 (Segmentation fault: 11).
Got signal: 11 (Segmentation fault: 11).
Mon Mar 18 11:28:50.638 Got signal: 11 (Segmentation fault: 11).
 
 
Got signal: 11 (Segmentation fault: 11).
 
 
Mon Mar 18 11:28:50.662 Backtrace:
0x1006dbb8b 0x10000185b 0x100001f11 0x7fff8b6698ea 0x7fff00000002 0x7ea947259716e7ee
 0   mongod                              0x00000001006dbb8b _ZN5mongo15printStackTraceERSo + 43
 1   mongod                              0x000000010000185b _ZN5mongo10abruptQuitEi + 987
 2   mongod                              0x0000000100001f11 _ZN5mongo24abruptQuitWithAddrSignalEiP9__siginfoPv + 673
 3   libsystem_c.dylib                   0x00007fff8b6698ea _sigtramp + 26
 4   ???                                 0x00007fff00000002 0x0 + 140733193388034
 5   ???                                 0x7ea947259716e7ee 0x0 + 9126904346617243630
Mon Mar 18 11:28:50.662
Mon Mar 18 11:28:50.662 Backtrace:
0x1006dbb8b 0x10000185b 0x100001f11 0x7fff8b6698ea 0x7fff00000002 0x7ea947259716e7ee
 0   mongod                              0x00000001006dbb8b _ZN5mongo15printStackTraceERSo + 43
 1   mongod                              0x000000010000185b _ZN5mongo10abruptQuitEi + 987
 2   mongod                              0x0000000100001f11 _ZN5mongo24abruptQuitWithAddrSignalEiP9__siginfoPv + 673
 3   libsystem_c.dylib                   0x00007fff8b6698ea _sigtramp + 26
 4   ???                                 0x00007fff00000002 0x0 + 140733193388034
 5   ???                                 0x7ea947259716e7ee 0x0 + 9126904346617243630
Mon Mar 18 11:28:50.662 Mon Mar 18 11:28:50.662 Backtrace:
0x1006dbb8b 0x10000185b 0x100001f11 0x7fff8b6698ea 0x7fff00000002 0x7ea947259716e7ee
 0   mongod                              0x00000001006dbb8b _ZN5mongo15printStackTraceERSo + 43
 1   mongod                              0x000000010000185b _ZN5mongo10abruptQuitEi + 987
 2   mongod                              0x0000000100001f11 _ZN5mongo24abruptQuitWithAddrSignalEiP9__siginfoPv + 673



 Comments   
Comment by Ryan Nitz [ 29/Mar/13 ]

Nm... pulling down the unstable now to test.

Comment by Ryan Nitz [ 29/Mar/13 ]

Thanks for fixing this! Any idea when the fix will land in the nightly builds?

Comment by auto [ 28/Mar/13 ]

Author:

{u'date': u'2013-03-20T19:49:38Z', u'name': u'Andy Schwerin', u'email': u'schwerin@10gen.com'}

Message: SERVER-9014 Synchronize access to CmdGetNonce::_random.
Branch: v2.4
https://github.com/mongodb/mongo/commit/fde9ff094deccc27ef0701b8e46d8b3f35db3775

Comment by Andy Schwerin [ 20/Mar/13 ]

Should backport to 2.4.

Comment by auto [ 20/Mar/13 ]

Author:

{u'date': u'2013-03-20T19:49:38Z', u'name': u'Andy Schwerin', u'email': u'schwerin@10gen.com'}

Message: SERVER-9014 Synchronize access to CmdGetNonce::_random.
Branch: master
https://github.com/mongodb/mongo/commit/59a863f82956e070331b7262070d3ef7de4324c5

Comment by Andy Schwerin [ 20/Mar/13 ]

Tad nailed the problem. The "go" driver issues a getnonce command on every new connection, which increases the number of simultaneous operations executing CmdGetNonce::run(). Since the commands are singleton objects, they all were using the same instance of SecureRandom to generate nonces. SecureRandom does no locking internally, and so must be locked by users. Patch to follow.

For those who prefer repros written in the Mongo shell, here's one:

var i;
for (i = 0; i < 100; ++i)
    startParallelShell('for (;;) db.runCommand({getnonce: 1});');
for (;;) {
    sleep(1000);
    print('.');
}

Run the above code, and eventually a 2.4.0 mongod or mongos will crash. It's not a useful automatic regression test, because you cannot distinguish between "bug fixed" and "haven't waited long enough".

Comment by Tad Marshall [ 19/Mar/13 ]

I was able to duplicate the crash on Mac OS/X (10.7.5) after just a few minutes.

Tue Mar 19 12:10:50.243 [conn4823] end connection 127.0.0.1:56264 (99 connections now open)
Tue Mar 19 12:10:50.243 [conn4825] end connection 127.0.0.1:56266 (99 connections now open)
Tue Mar 19 12:10:50.246 [conn4813] end connection 127.0.0.1:56254 (97 connections now open)
Tue Mar 19 12:10:50.246 [initandlisten] connection accepted from 127.0.0.1:56365 #4923 (99 connections now open)
Tue Mar 19 12:10:50.247 [initandlisten] connection accepted from 127.0.0.1:56366 #4924 (100 connections now open)
Tue Mar 19 12:10:50.248 [initandlisten] connection accepted from 127.0.0.1:56367 #4925 (100 connections now open)
Tue Mar 19 12:10:50.281 Invalid access at address: 0x7fa7c9fffff8 from thread: conn4905
 
Tue Mar 19 12:10:50.281 Got signal: 11 (Segmentation fault: 11).
 
Tue Mar 19 12:10:50.284 Backtrace:
0x109b37825 0x10921495e 0x109214eb9 0x7fff8adb0cfa 0x2a2a2a2a2a2a2a2a 0x9ba462bbaf07afdb 
 0   mongod                              0x0000000109b37825 _ZN5mongo15printStackTraceERSo + 37
 1   mongod                              0x000000010921495e _ZN5mongo10abruptQuitEi + 446
 2   mongod                              0x0000000109214eb9 _ZN5mongo24abruptQuitWithAddrSignalEiP9__siginfoPv + 537
 3   libsystem_c.dylib                   0x00007fff8adb0cfa _sigtramp + 26
 4   ???                                 0x2a2a2a2a2a2a2a2a 0x0 + 3038287259199220266
 5   ???                                 0x9ba462bbaf07afdb 0x0 + 11215197530294300635

Comment by Daniel Gottlieb (Inactive) [ 19/Mar/13 ]

Running 2.4.0-rc2-pre:

(a long log of connections opening and closing for 1.5 hours)

Mon Mar 18 18:22:01.632 [initandlisten] connection accepted from 127.0.0.1:47031 #203972 (100 connections now open)
Mon Mar 18 18:22:01.632 [initandlisten] connection accepted from 127.0.0.1:47032 #203973 (100 connections now open)
Mon Mar 18 18:22:01.632 [conn203874] end connection 127.0.0.1:46933 (99 connections now open)
Mon Mar 18 18:22:01.632 [initandlisten] connection accepted from 127.0.0.1:47033 #203974 (100 connections now open)
Mon Mar 18 18:22:01.634 Invalid access at address: 0x7f7171098ff0 from thread: conn203974
 
Mon Mar 18 18:22:01.634 Got signal: 11 (Segmentation fault).
 
Segmentation fault (core dumped)

Can't seem to find the core dump though, but this seems rather reproducible.

Comment by Ryan Nitz [ 18/Mar/13 ]

I ran this on on the nightly and it crashed too.

db version v2.5.0-pre-
Mon Mar 18 16:34:12.103 git version: ea168df93a2681edf003304123888a94f7f3da38

Mon Mar 18 16:33:27.616 [initandlisten] connection accepted from 127.0.0.1:60454 #3606 (200 connections now open)
Mon Mar 18 16:33:27.620 Invalid access at address: 0x101fffff9 from thread: conn3601
 
Mon Mar 18 16:33:27.620 Got signal: 11 (Segmentation fault: 11).
 
Mon Mar 18 16:33:27.623 Mon Mar 18 16:33:27.623 Mon Mar 18 16:33:27.623 Invalid access at address: 0x101fffff9 from thread: conn3606
Invalid access at address: 0x101fffff9 from thread: conn3604
Invalid access at address: 0x101fffff9 from thread: conn3605
 
 
 
Mon Mar 18 16:33:27.624 Mon Mar 18 16:33:27.624 Got signal: 11 (Segmentation fault: 11).
Mon Mar 18 16:33:27.624 Got signal: 11 (Segmentation fault: 11).
 
Got signal: 11 (Segmentation fault: 11).
 
 
Mon Mar 18 16:33:27.628 Mon Mar 18 16:33:27.628 Mon Mar 18 16:33:27.628 Backtrace:
0x1006e7c2b 0x10000246b 0x100002b21 0x7fff8710894a 0x7fff000000 0x2957019314401ec2
 0   mongod                              0x00000001006e7c2b _ZN5mongo15printStackTraceERSo + 43
 1   mongod                              0x000000010000246b _ZN5mongo10abruptQuitEi + 987
 2   mongod                              0x0000000100002b21 _ZN5mongo24abruptQuitWithAddrSignalEiP9__siginfoPv + 673
 3   libsystem_c.dylib                   0x00007fff8710894a _sigtramp + 26
 4   ???                                 0x0000007fff000000 0x0 + 549739036672
 5   ???                                 0x2957019314401ec2 0x0 + 2978851409740439234
Mon Mar 18 16:33:27.628 Backtrace:
0x1006e7c2b 0x10000246b 0x100002b21 0x7fff8710894a 0x7fff000000 0x2957019314401ec2
 0   mongod                              0x00000001006e7c2b _ZN5mongo15printStackTraceERSo + 43
 1   mongod                              0x000000010000246b _ZN5mongo10abruptQuitEi + 987
 2   mongod                              0x0000000100002b21 _ZN5mongo24abruptQuitWithAddrSignalEiP9__siginfoPv + 673
 3   libsystem_c.dylib                   0x00007fff8710894a _sigtramp + 26
 4   ???                                 0x0000007fff000000 0x0 + 549739036672
 5   ???                                 0x2957019314401ec2 0x0 + 2978851409740439234
Backtrace:
0x1006e7c2b 0x10000246b 0x100002b21 0x7fff8710894a 0x7fff000000 0x2957019314401ec2
 0   mongod                              0x00000001006e7c2b _ZN5mongo15printStackTraceERSo + 43
 1   mongod                              0x000000010000246b _ZN5mongo10abruptQuitEi + 987
 2   mongod                              0x0000000100002b21 _ZN5mongo24abruptQuitWithAddrSignalEiP9__siginfoPv + 673
 3   libsystem_c.dylib                   0x00007fff8710894a _sigtramp + 26
 4   ???                                 0x0000007fff000000 0x0 + 549739036672
 5   ???                                 0x2957019314401ec2 0x0 + 2978851409740439234
 
Backtrace:
0x1006e7c2b 0x10000246b 0x100002b21 0x7fff8710894a 0x7fff000000 0x2957019314401ec2
 0   mongod                              0x00000001006e7c2b _ZN5mongo15printStackTraceERSo + 43
 1   mongod                              0x000000010000246b _ZN5mongo10abruptQuitEi + 987
 2   mongod                              0x0000000100002b21 _ZN5mongo24abruptQuitWithAddrSignalEiP9__siginfoPv + 673
 3   libsystem_c.dylib                   0x00007fff8710894a _sigtramp + 26
 4   ???                                 0x0000007fff000000 0x0 + 549739036672
 5   ???                                 0x2957019314401ec2 0x0 + 2978851409740439234

Comment by Ryan Nitz [ 18/Mar/13 ]

I was able to recreate on 2.4.0 rc3. I am trying the nightly now.

Generated at Thu Feb 08 03:19:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.