Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18005

Unclear why Primary mongod just exited - killed by oom killer

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: WiredTiger
    • Labels:
      None
    • ALL

      The primary/secondary in one of our shards were running perfectly. Then at 4 am the primary mongod exited but there was no crash at all. Here is the log below. Note that I see a line that says: [ReplicationExecutor] stepping down; another primary was elected more recently

      But it is unclear why the process exited.

      2015-04-13T03:59:50.981+0000 I COMMAND  [conn24621] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, 633D626977732C723D672B523057643766366C34576F2F594A75347639505765486B5039467867303446486672684258717644726E452B6F4161587A49756E675A2B6E446832...), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:168 locks:{} 3051ms
      2015-04-13T03:59:51.346+0000 I QUERY    [conn24614] query cubes_prod.accounts query: { UserKey: ObjectId('5520c855498bf1aa0b38e96e'), _id: ObjectId('5520c855498bf1aa0b38e970') } planSummary: IXSCAN { _id: 1 } ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:948 locks:{ Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 9948ms
      2015-04-13T03:59:50.981+0000 I COMMAND  [conn24619] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, ), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:138 locks:{} 2492ms
      2015-04-13T03:59:50.981+0000 I COMMAND  [conn24618] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, ), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:138 locks:{} 2492ms
      2015-04-13T03:59:51.152+0000 I NETWORK  [initandlisten] connection accepted from 10.0.8.116:25158 #24628 (447 connections now open)
      2015-04-13T03:59:50.981+0000 I COMMAND  [conn24622] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, 633D626977732C723D62544543755243674F4D4D75365350417775586642706536514835536A46394A42342F4A4256784835666F433741784D533449457A467543754C6B754D...), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:168 locks:{} 3051ms
      2015-04-13T03:59:51.357+0000 I QUERY    [conn24613] query cubes_prod.accounts query: { UserKey: ObjectId('5521de8dbbefd95324a8212d'), _id: ObjectId('5521e974498bf1aa0b38e9ca') } planSummary: IXSCAN { _id: 1 } ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:769 locks:{ Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 10003ms
      2015-04-13T03:59:51.393+0000 I QUERY    [conn2748] getmore local.oplog.rs cursorid:92163486263 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:5 reslen:905 locks:{ Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } 3768ms
      2015-04-13T03:59:52.235+0000 I REPL     [ReplicationExecutor] Member ip-10-0-21-50.ec2.internal:27017 is now in state PRIMARY
      2015-04-13T03:59:52.235+0000 I COMMAND  [conn10173] command admin.$cmd command: getCmdLineOpts { getCmdLineOpts: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:677 locks:{} 237ms
      2015-04-13T03:59:53.012+0000 I NETWORK  [initandlisten] connection accepted from 10.0.8.116:25159 #24629 (448 connections now open)
      2015-04-13T03:59:53.099+0000 I COMMAND  [conn24625] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D6C5469597A637074596F4D5963746D74586B37477A656E47535337362B303846) } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 1669ms
      2015-04-13T03:59:53.256+0000 I ACCESS   [conn24621] Successfully authenticated as principal __system on local
      2015-04-13T03:59:53.904+0000 I COMMAND  [conn2859] command admin.$cmd command: ping { ping: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 164ms
      2015-04-13T03:59:54.143+0000 I COMMAND  [conn24626] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D546F54595378306F74596E6D6F385530745A566E69397435554A633753435371) } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 1944ms
      2015-04-13T03:59:54.335+0000 I REPL     [ReplicationExecutor] replset: another primary seen with election time 552b3f43:1 my election time is 5526695a:1
      2015-04-13T03:59:54.643+0000 I NETWORK  [initandlisten] connection accepted from 10.0.8.116:25160 #24630 (449 connections now open)
      2015-04-13T03:59:55.031+0000 I ACCESS   [conn24622] Successfully authenticated as principal __system on local
      2015-04-13T03:59:56.316+0000 I REPL     [ReplicationExecutor] stepping down; another primary was elected more recently
      2015-04-13T03:59:56.731+0000 I COMMAND  [conn24624] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, 633D626977732C723D5679634B4C4A675141742F43546C324D6E5739683858344A74507452534E7862626B6768455534424953636F48373347646652416836574F78414A4477...), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:168 locks:{} 2910ms
      2015-04-13T03:59:56.731+0000 I COMMAND  [conn24623] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, 633D626977732C723D737669324D3244795063542F35454632546E7743667A46586F744F32454356656B304E4E546944785459662F6D526D6D74464E70445455526842474862...), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:168 locks:{} 3213ms
      2015-04-13T03:59:56.800+0000 I COMMAND  [conn24627] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D615946656A7A75466F2B426A384B48754F4D675341354E49592B4A4F67507942) } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 1820ms
      2015-04-13T03:59:58.087+0000 I COMMAND  [conn24621] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, ), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:138 locks:{} 2468ms
      2015-04-13T03:59:58.382+0000 I REPL     [ReplicationExecutor] Stepping down from primary in response to heartbeat
      2015-04-13T03:59:58.689+0000 I NETWORK  [initandlisten] connection accepted from 10.0.8.116:25161 #24631 (450 connections now open)
      2015-04-13T04:00:01.536+0000 I COMMAND  [conn24622] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, ), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:138 locks:{} 4243ms
      2015-04-13T04:00:01.874+0000 I ACCESS   [conn24624] Successfully authenticated as principal __system on local
      2015-04-13T04:00:02.625+0000 I COMMAND  [conn24628] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D4E2B3177683151512B484F39665A2F42794B437279454E422B6A7464792F7153) } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 4630ms
      2015-04-13T04:00:02.920+0000 I COMMAND  [conn10173] command admin.$cmd command: logout { logout: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 1067ms
      2015-04-13T04:00:02.920+0000 I COMMAND  [conn2859] command admin.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D696E626F786375626564622C723D4E74436937457948) } keyUpdates:0 writeConflicts:0 numYields:0 reslen:215 locks:{} 3210ms
      2015-04-13T04:00:03.106+0000 I ACCESS   [conn24623] Successfully authenticated as principal __system on local
      2015-04-13T04:00:03.731+0000 I NETWORK  [initandlisten] connection accepted from 10.0.8.116:25162 #24632 (451 connections now open)
      2015-04-13T04:00:04.129+0000 I COMMAND  [conn2682] command admin.$cmd command: ping { ping: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:97 locks:{} 289ms
      2015-04-13T04:00:04.523+0000 I COMMAND  [conn24629] command local.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D776F553972343872324D5334384F41482B6F36334765454642426B69416F5678) } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:239 locks:{} 4167ms
      2015-04-13T04:00:04.869+0000 I COMMAND  [conn24625] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, 633D626977732C723D6C5469597A637074596F4D5963746D74586B37477A656E47535337362B3038462F63714139757356666A2B3156694D4D394946646D6D334653566E5474...), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:168 locks:{} 6077ms
      2015-04-13T04:00:07.060+0000 I COMMAND  [conn24626] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, 633D626977732C723D546F54595378306F74596E6D6F385530745A566E69397435554A63375343537139413866773530666732365047667079526D585034306A4F4C4C743132...), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:168 locks:{} 6417ms
      2015-04-13T04:00:08.087+0000 I COMMAND  [conn24624] command local.$cmd command: saslContinue { saslContinue: 1, payload: BinData(0, ), conversationId: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:138 locks:{} 4613ms
      2015-04-13T04:00:09.507+0000 I COMMAND  [conn22458] command admin.$cmd command: isMaster { ismaster: 1 } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:514 locks:{} 15102ms
      2015-04-13T04:00:09.730+0000 I NETWORK  [initandlisten] connection accepted from 10.0.8.116:25163 #24633 (452 connections now open)
      2015-04-13T11:45:45.839+0000 I CONTROL  ***** SERVER RESTARTED *****
      2015-04-13T11:45:45.943+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=8G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=zlib),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
      

            Assignee:
            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            Reporter:
            klall Kam Lall
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: