-
Type:
Bug
-
Resolution: Done
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: WiredTiger
-
None
-
ALL
The primary/secondary in one of our shards were running perfectly. Then at 4 am the primary mongod exited but there was no crash at all. Here is the log below. Note that I see a line that says: [ReplicationExecutor] stepping down; another primary was elected more recently
But it is unclear why the process exited.
2015-04-13T03:59:50.981+0000 I COMMAND [conn24621] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, 633D626977732C723D672B523057643766366C34576F2F594A75347639505765486B5039467867303446486672684258717644726E452B6F4161587A49756E675A2B6E446832...), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:168 locks:{} 3051ms 2015-04-13T03:59:51.346+0000 I QUERY [conn24614] query cubes_prod.accounts query: { UserKey: ObjectId('5520c855498bf1aa0b38e96e'), _id: ObjectId('5520c855498bf1aa0b38e970') } planSummary: IXSCAN { _id: 1 } ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:948 locks:{ Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 9948ms 2015-04-13T03:59:50.981+0000 I COMMAND [conn24619] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, ), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:138 locks:{} 2492ms 2015-04-13T03:59:50.981+0000 I COMMAND [conn24618] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, ), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:138 locks:{} 2492ms 2015-04-13T03:59:51.152+0000 I NETWORK [initandlisten] connection accepted from 10.0.8.116:25158 #24628 (447 connections now open) 2015-04-13T03:59:50.981+0000 I COMMAND [conn24622] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, 633D626977732C723D62544543755243674F4D4D75365350417775586642706536514835536A46394A42342F4A4256784835666F433741784D533449457A467543754C6B754D...), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:168 locks:{} 3051ms 2015-04-13T03:59:51.357+0000 I QUERY [conn24613] query cubes_prod.accounts query: { UserKey: ObjectId('5521de8dbbefd95324a8212d'), _id: ObjectId('5521e974498bf1aa0b38e9ca') } planSummary: IXSCAN { _id: 1 } ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:769 locks:{ Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 10003ms 2015-04-13T03:59:51.393+0000 I QUERY [conn2748] getmore local.oplog.rs cursorid:92163486263 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:5 reslen:905 locks:{ Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } 3768ms 2015-04-13T03:59:52.235+0000 I REPL [ReplicationExecutor] Member ip-10-0-21-50.ec2.internal:27017 is now in state PRIMARY 2015-04-13T03:59:52.235+0000 I COMMAND [conn10173] command admin.$cmd command: getCmdLineOpts { getCmdLineOpts: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:677 locks:{} 237ms 2015-04-13T03:59:53.012+0000 I NETWORK [initandlisten] connection accepted from 10.0.8.116:25159 #24629 (448 connections now open) 2015-04-13T03:59:53.099+0000 I COMMAND [conn24625] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D6C5469597A637074596F4D5963746D74586B37477A656E47535337362B303846) } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 1669ms 2015-04-13T03:59:53.256+0000 I ACCESS [conn24621] Successfully authenticated as principal __system on local 2015-04-13T03:59:53.904+0000 I COMMAND [conn2859] command admin.$cmd command: ping { ping: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 164ms 2015-04-13T03:59:54.143+0000 I COMMAND [conn24626] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D546F54595378306F74596E6D6F385530745A566E69397435554A633753435371) } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 1944ms 2015-04-13T03:59:54.335+0000 I REPL [ReplicationExecutor] replset: another primary seen with election time 552b3f43:1 my election time is 5526695a:1 2015-04-13T03:59:54.643+0000 I NETWORK [initandlisten] connection accepted from 10.0.8.116:25160 #24630 (449 connections now open) 2015-04-13T03:59:55.031+0000 I ACCESS [conn24622] Successfully authenticated as principal __system on local 2015-04-13T03:59:56.316+0000 I REPL [ReplicationExecutor] stepping down; another primary was elected more recently 2015-04-13T03:59:56.731+0000 I COMMAND [conn24624] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, 633D626977732C723D5679634B4C4A675141742F43546C324D6E5739683858344A74507452534E7862626B6768455534424953636F48373347646652416836574F78414A4477...), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:168 locks:{} 2910ms 2015-04-13T03:59:56.731+0000 I COMMAND [conn24623] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, 633D626977732C723D737669324D3244795063542F35454632546E7743667A46586F744F32454356656B304E4E546944785459662F6D526D6D74464E70445455526842474862...), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:168 locks:{} 3213ms 2015-04-13T03:59:56.800+0000 I COMMAND [conn24627] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D615946656A7A75466F2B426A384B48754F4D675341354E49592B4A4F67507942) } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 1820ms 2015-04-13T03:59:58.087+0000 I COMMAND [conn24621] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, ), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:138 locks:{} 2468ms 2015-04-13T03:59:58.382+0000 I REPL [ReplicationExecutor] Stepping down from primary in response to heartbeat 2015-04-13T03:59:58.689+0000 I NETWORK [initandlisten] connection accepted from 10.0.8.116:25161 #24631 (450 connections now open) 2015-04-13T04:00:01.536+0000 I COMMAND [conn24622] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, ), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:138 locks:{} 4243ms 2015-04-13T04:00:01.874+0000 I ACCESS [conn24624] Successfully authenticated as principal __system on local 2015-04-13T04:00:02.625+0000 I COMMAND [conn24628] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D4E2B3177683151512B484F39665A2F42794B437279454E422B6A7464792F7153) } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 4630ms 2015-04-13T04:00:02.920+0000 I COMMAND [conn10173] command admin.$cmd command: logout { logout: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 1067ms 2015-04-13T04:00:02.920+0000 I COMMAND [conn2859] command admin.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D696E626F786375626564622C723D4E74436937457948) } keyUpdates:0 writeConflicts:0 numYields:0 reslen:215 locks:{} 3210ms 2015-04-13T04:00:03.106+0000 I ACCESS [conn24623] Successfully authenticated as principal __system on local 2015-04-13T04:00:03.731+0000 I NETWORK [initandlisten] connection accepted from 10.0.8.116:25162 #24632 (451 connections now open) 2015-04-13T04:00:04.129+0000 I COMMAND [conn2682] command admin.$cmd command: ping { ping: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 289ms 2015-04-13T04:00:04.523+0000 I COMMAND [conn24629] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D776F553972343872324D5334384F41482B6F36334765454642426B69416F5678) } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 4167ms 2015-04-13T04:00:04.869+0000 I COMMAND [conn24625] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, 633D626977732C723D6C5469597A637074596F4D5963746D74586B37477A656E47535337362B3038462F63714139757356666A2B3156694D4D394946646D6D334653566E5474...), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:168 locks:{} 6077ms 2015-04-13T04:00:07.060+0000 I COMMAND [conn24626] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, 633D626977732C723D546F54595378306F74596E6D6F385530745A566E69397435554A63375343537139413866773530666732365047667079526D585034306A4F4C4C743132...), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:168 locks:{} 6417ms 2015-04-13T04:00:08.087+0000 I COMMAND [conn24624] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, ), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:138 locks:{} 4613ms 2015-04-13T04:00:09.507+0000 I COMMAND [conn22458] command admin.$cmd command: isMaster { ismaster: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:514 locks:{} 15102ms 2015-04-13T04:00:09.730+0000 I NETWORK [initandlisten] connection accepted from 10.0.8.116:25163 #24633 (452 connections now open) 2015-04-13T11:45:45.839+0000 I CONTROL ***** SERVER RESTARTED ***** 2015-04-13T11:45:45.943+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=8G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=zlib),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),