[SERVER-8230] SSL Related Crash Created: 18/Jan/13  Updated: 23/Feb/15  Resolved: 11/Feb/13

Status: Closed
Project: Core Server
Component/s: Security
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Adam Comerford Assignee: Eric Milkie
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.04, SSL enabled mongod, compiled from master - git version: e3ab2ed4f9612a0b6b9e8cf0505e4b979b511472


Attachments: Text File mongod-ubuntu1204-2.3pre-ssl.log    
Operating System: ALL
Steps To Reproduce:

This was a secondary, not taking any traffic in particular, only a handful of connection open(testing was taking place elsewhere in the sharded cluster). I have restarted it, to see if it occurs again.

Participants:

 Description   

Log Snippet (will attach full log):

Fri Jan 18 02:28:15.411 [conn1021] could not negotiate SSL connection: EOF detected
Fri Jan 18 02:28:15.411 [conn1021] SocketException handling request, closing client connection: 9001 socket exception [6] 
Fri Jan 18 02:28:15.412 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: ubuntu-1104-64-4:30001
Fri Jan 18 02:28:15.722 [rsHealthPoll] replset info ubuntu-1104-64-4:30001 thinks that we are down
Fri Jan 18 02:28:15.722 [rsHealthPoll] replSet member ubuntu-1104-64-4:30001 is now in state SECONDARY
Fri Jan 18 02:28:15.722 [rsMgr] not electing self, ubuntu-1104-64-4:30001 would veto with 'I don't think dub-10gen-linux-1:30001 is electable'
Fri Jan 18 02:28:17.411 [initandlisten] connection accepted from 10.7.100.25:48206 #1022 (6 connections now open)
Fri Jan 18 02:28:22.390 [rsMgr] replSet info electSelf 1
Fri Jan 18 02:28:25.412 [rsMgr] replSet PRIMARY
Fri Jan 18 02:28:25.419 [conn1022] end connection 10.7.100.25:48206 (5 connections now open)
Fri Jan 18 02:28:25.420 [initandlisten] connection accepted from 10.7.100.25:48207 #1023 (6 connections now open)
Fri Jan 18 02:28:25.538 [initandlisten] connection accepted from 10.7.100.25:48208 #1024 (7 connections now open)
Fri Jan 18 02:28:25.858 [initandlisten] connection accepted from 10.7.100.25:48209 #1025 (8 connections now open)
Fri Jan 18 02:28:26.864 [slaveTracking] build index local.slaves { _id: 1 }
Fri Jan 18 02:28:26.864 [slaveTracking] build index done.  scanned 0 total records. 0 secs
Fri Jan 18 02:28:29.710 [initandlisten] connection accepted from 127.0.0.1:43114 #1026 (9 connections now open)
Fri Jan 18 02:28:45.432 [conn1023] end connection 10.7.100.25:48207 (8 connections now open)
Fri Jan 18 02:28:45.432 [initandlisten] connection accepted from 10.7.100.25:48210 #1027 (9 connections now open)
Fri Jan 18 02:29:01.287 [conn10] command admin.$cmd command: { writebacklisten: ObjectId('50f8413b7b3df9a25364105a') } ntoreturn:1 keyUpdates:0  reslen:44 300000ms
Fri Jan 18 02:29:04.562 Invalid access at address: 0x7f45c44f0000 from thread: conn1025
 
Fri Jan 18 02:29:04.562 Got signal: 11 (Segmentation fault).
 
Fri Jan 18 02:29:04.563 Backtrace:
0xc34bf3 0x709a4e 0x709f4b 0x7f73f642fcb0 0x7f73f5e67adc 
 /home/adam/mongo/2.3-pre-ssl/mongod(_ZN5mongo15printStackTraceERSo+0x23) [0xc34bf3]
 /home/adam/mongo/2.3-pre-ssl/mongod(_ZN5mongo10abruptQuitEi+0x39e) [0x709a4e]
 /home/adam/mongo/2.3-pre-ssl/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x25b) [0x709f4b]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f73f642fcb0]
 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0(+0x6badc) [0x7f73f5e67adc]



 Comments   
Comment by Adam Comerford [ 29/Jan/13 ]

No, no signs of this re-occurring (after many days running, same environment), happy to close this one as Cannot Reproduce

Comment by Eric Milkie [ 29/Jan/13 ]

adamc Did it happen again since?

Comment by Eric Milkie [ 18/Jan/13 ]

I don't think the "EOF" message is related. You can get that if you attempt to connect and then close the socket before the SSL handshake is complete. I see that message quite a bit in our unit tests. Also, the message was for a different connection than the one that crashed and it was almost a minute between the EOF and the segfault.

The crash happened on a connection that had been connected for 40 seconds. I wish the stacktrace had more frames. If we can reproduce this, it would be good to get a core dump if possible.

Comment by Adam Comerford [ 18/Jan/13 ]

Attaching log

Generated at Thu Feb 08 03:16:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.