Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-24709

mongod crashes on mongodump run(probably due to low memory)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Incomplete
    • Affects Version/s: 3.2.6
    • Fix Version/s: None
    • Component/s: Stability, Tools
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Run mongodump and then stress the memory of the OS.

      Show
      Run mongodump and then stress the memory of the OS.

      Description

      Server - AWS EC2 t2.small, CentOS Linux release 7.2.1511 (Core)
      DB Version - MongoDB community 3.2.6
      Topology: Replica Set

      When running mongodump command on the secondary node, the db crashes in 10% of the times.

      I stressed the memory to reproduce, and looks like it happens due to low memory (though I have enough free memory when running/during the dump).

      The line when dump fails:
      2016-06-22T12:19:18.871+0300 Failed: error reading collection: EOF && Mux ending but selectCases still open 2

      The last messages in mongod.log:

      2016-06-22T12:19:13.162+0300 D STORAGE  [conn10] WT begin_transaction
      2016-06-22T12:19:13.187+0300 D QUERY    [conn12] Running getMore, cursorid: 225913259574
      2016-06-22T12:19:13.187+0300 D STORAGE  [conn12] WT begin_transaction
      2016-06-22T12:19:13.224+0300 D STORAGE  [WTJournalFlusher] flushed journal
      2016-06-22T12:19:13.324+0300 D STORAGE  [WTJournalFlusher] flushed journal
      2016-06-22T12:19:13.394+0300 D EXECUTOR [ReplicationExecutor] Scheduling remote request: RemoteCommand 115 -- target:production-app03:27017 db:admin cmd:{ replSetHeartbeat: "prod", configVersion: 3, from: "production-db02:27017", fromId: 1, term: 5 }
      2016-06-22T12:19:13.394+0300 D ASIO     [ReplicationExecutor] startCommand: RemoteCommand 115 -- target:production-app03:27017 db:admin expDate:2016-06-22T12:19:23.393+0300 cmd:{ replSetHeartbeat: "prod", configVersion: 3, from: "production-db02:27017", fromId: 1, term: 5 }
      2016-06-22T12:19:13.394+0300 D ASIO     [NetworkInterfaceASIO-Replication-0] Initiating asynchronous command: RemoteCommand 115 -- target:production-app03:27017 db:admin expDate:2016-06-22T12:19:23.393+0300 cmd:{ replSetHeartbeat: "prod", configVersion: 3, from: "production-db02:27017", fromId: 1, term: 5 }
      2016-06-22T12:19:13.394+0300 D ASIO     [NetworkInterfaceASIO-Replication-0] Starting asynchronous command 115 on host production-app03:27017
      2016-06-22T12:19:13.396+0300 D EXECUTOR [NetworkInterfaceASIO-Replication-0] Received remote response: RemoteResponse --  cmd:{ ok: 1.0, state: 7, v: 3, hbmsg: "", set: "prod", term: 5, primaryId: 0, durableOpTime: { ts: Timestamp 1466525772000|2012, t: 5 }, opTime: { ts: Timestamp 1466525772000|2012, t: 5 } }
      2016-06-22T12:19:13.396+0300 D ASIO     [NetworkInterfaceASIO-Replication-0] Failed to time operation 115 out: Operation aborted.
      2016-06-22T12:19:13.396+0300 D REPL     [ReplicationExecutor] setUpValues: heartbeat response good for member _id:2, msg:  
      2016-06-22T12:19:13.396+0300 D REPL     [ReplicationExecutor] Scheduling heartbeat to production-app03:27017 at 2016-06-22T09:19:18.396Z
      2016-06-22T12:19:13.415+0300 D EXECUTOR [ReplicationExecutor] Scheduling remote request: RemoteCommand 117 -- target:production-db01:27017 db:admin cmd:{ replSetHeartbeat: "prod", configVersion: 3, from: "production-db02:27017", fromId: 1, term: 5 }
      2016-06-22T12:19:13.415+0300 D ASIO     [ReplicationExecutor] startCommand: RemoteCommand 117 -- target:production-db01:27017 db:admin expDate:2016-06-22T12:19:23.415+0300 cmd:{ replSetHeartbeat: "prod", configVersion: 3, from: "production-db02:27017", fromId: 1, term: 5 }
      2016-06-22T12:19:13.415+0300 D ASIO     [NetworkInterfaceASIO-Replication-0] Initiating asynchronous command: RemoteCommand 117 -- target:production-db01:27017 db:admin expDate:2016-06-22T12:19:23.415+0300 cmd:{ replSetHeartbeat: "prod", configVersion: 3, from: "production-db02:27017", fromId: 1, term: 5 }
      2016-06-22T12:19:13.415+0300 D ASIO     [NetworkInterfaceASIO-Replication-0] Starting asynchronous command 117 on host production-db01:27017
      2016-06-22T12:19:13.417+0300 D EXECUTOR [NetworkInterfaceASIO-Replication-0] Received remote response: RemoteResponse --  cmd:{ ok: 1.0, electionTime: new Date(6293366302669012993), state: 1, v: 3, hbmsg: "", set: "prod", term: 5, primaryId: 0, durableOpTime: { ts: Timestamp 1466525772000|2012, t: 5 }, opTime: { ts: Timestamp 1466525772000|2012, t: 5 } }
      2016-06-22T12:19:13.417+0300 D ASIO     [NetworkInterfaceASIO-Replication-0] Failed to time operation 117 out: Operation aborted.
      2016-06-22T12:19:13.417+0300 D REPL     [ReplicationExecutor] Canceling election timeout callback at 2016-06-22T12:19:19.650+0300
      2016-06-22T12:19:13.417+0300 D REPL     [ReplicationExecutor] Scheduling election timeout callback at 2016-06-22T12:19:24.755+0300
      2016-06-22T12:19:13.833+0300 D REPL     [ReplicationExecutor] setUpValues: heartbeat response good for member _id:0, msg:  
      2016-06-22T12:19:13.833+0300 D REPL     [ReplicationExecutor] Scheduling heartbeat to production-db01:27017 at 2016-06-22T09:19:18.417Z
      2016-06-22T12:19:13.833+0300 D STORAGE  [WTJournalFlusher] flushed journal
      2016-06-22T12:19:13.833+0300 D STORAGE  [conn13] WT rollback_transaction
      2016-06-22T12:19:13.833+0300 D STORAGE  [conn13] WT begin_transaction
      2016-06-22T12:19:13.846+0300 D STORAGE  [conn10] WT rollback_transaction
      2016-06-22T12:19:13.846+0300 D STORAGE  [conn10] WT begin_transaction
      2016-06-22T12:19:13.846+0300 D STORAGE  [conn12] WT rollback_transaction
      2016-06-22T12:19:13.846+0300 D STORAGE  [conn12] WT begin_transaction
      2016-06-22T12:19:13.846+0300 D STORAGE  [conn12] WT rollback_transaction
      2016-06-22T12:19:13.846+0300 D STORAGE  [conn12] WT begin_transaction
      2016-06-22T12:19:13.859+0300 D STORAGE  [conn12] WT rollback_transaction
      2016-06-22T12:19:13.859+0300 D STORAGE  [conn12] WT begin_transaction
      2016-06-22T12:19:13.859+0300 D STORAGE  [conn13] WT rollback_transaction
      2016-06-22T12:19:13.859+0300 D STORAGE  [conn13] WT begin_transaction
      

        Attachments

          Activity

            People

            Assignee:
            kelsey.schubert Kelsey T Schubert
            Reporter:
            KafKafOwn Or Khafi [X]
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: