Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38787

Initial sync brings down the primary

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • ALL

      Hi,

      We were attempting to add a new secondary node. As soon as initial sync started it started to cause issues on the primary. The CPU usage on primary increased and it was not accepting any connections. We killed the primary process manually and restarted it. After restart the node is stuck in  "command local.oplog.rs command: getMore" and is not taking any connections. Not sure where the getMore is coming from since we have stopped the node that was trying to sync.

      Finally after a while we gave up on this node and switched it standalone node so that we can access the data.

       

      2018-12-27T19:31:40.017+0000 I CONTROL [main] ***** SERVER RESTARTED *****
      2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] MongoDB starting : pid=21589 port=27017 dbpath=/mongodb_data 64-bit host=vm77a970234e
      2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] db version v4.0.3
      2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] git version: 7ea530946fa7880364d88c8d8b6026bbc9ffa48c
      2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
      2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] allocator: tcmalloc
      2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] modules: none
      2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] build environment:
      2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] distmod: rhel62
      2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] distarch: x86_64
      2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] target_arch: x86_64
      2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] options: \{ config: "/etc/mongod.conf", net: { bindIp: "0.0.0.0", port: 27017 }, processManagement: { fork: true, pidFilePath:
       "/var/run/mongodb/mongod.pid" }, replication: \{ replSetName: "RS-360iDocDB-0" }, security: \{ authorization: "enabled", keyFile: "/var/lib/mongo/rskey" }, setParameter: { oplogIniti
      alFindMaxSeconds: "120" }, storage: \{ dbPath: "/mongodb_data", engine: "wiredTiger", journal: { enabled: true }, wiredTiger: \{ collectionConfig: { blockCompressor: "snappy" } } }, s
      ystemLog: \{ destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log" } }
      2018-12-27T19:31:40.039+0000 W STORAGE [initandlisten] Detected unclean shutdown - /mongodb_data/mongod.lock is not empty.
      2018-12-27T19:31:40.160+0000 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
      2018-12-27T19:31:40.160+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=7520M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,st
      atistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
      2018-12-27T19:31:41.732+0000 I STORAGE [initandlisten] WiredTiger message [1545939101:732515][21589:0x7f5891565a60], txn-recover: Main recovery loop: starting at 89/15488
      2018-12-27T19:31:41.733+0000 I STORAGE [initandlisten] WiredTiger message [1545939101:733003][21589:0x7f5891565a60], txn-recover: Recovering log 89 through 90
      2018-12-27T19:31:42.006+0000 I STORAGE [initandlisten] WiredTiger message [1545939102:6115][21589:0x7f5891565a60], file:sizeStorer.wt, txn-recover: Recovering log 90 through 90
      2018-12-27T19:31:42.069+0000 I STORAGE [initandlisten] WiredTiger message [1545939102:69748][21589:0x7f5891565a60], file:sizeStorer.wt, txn-recover: Set global recovery timestamp:
      5c23bd2f00000004
      2018-12-27T19:31:42.145+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1545846063, 4)
      2018-12-27T19:31:42.145+0000 I STORAGE [initandlisten] Triggering the first stable checkpoint. Initial Data: Timestamp(1545846063, 4) PrevStable: Timestamp(0, 0) CurrStable: Timest
      amp(1545846063, 4)
      2018-12-27T19:31:42.979+0000 I STORAGE [initandlisten] Starting OplogTruncaterThread local.oplog.rs
      2018-12-27T19:31:42.979+0000 I STORAGE [initandlisten] The size storer reports that the oplog contains 33685 records totaling to 2111680101 bytes
      2018-12-27T19:31:43.059+0000 I STORAGE [initandlisten] Sampling from the oplog between Dec 26 14:43:17:55 and Dec 27 17:31:43:1 to determine where to place markers for truncation
      2018-12-27T19:31:43.059+0000 I STORAGE [initandlisten] Taking 990 samples and assuming that each section of oplog contains approximately 340 records totaling to 21314271 bytes
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:18:12
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:18:199
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:19:123
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:20:13
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:22:38
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:22:225
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:30:69
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:31:121
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:33:156
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:34:107
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:36:91
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:37:20
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:37:209
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:38:130
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:39:92
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:40:30
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:41:35
      2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:43:37
      ......
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:41:47:111
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:41:49:52
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:41:51:58
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:41:53:59
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:22:27
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:34:89
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:36:15
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:39:10
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:41:56
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:44:11
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:45:59
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:47:42
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:53:114
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:55:42
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:43:39:10
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:43:44:14
      2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:44:04:3
      2018-12-27T19:32:45.700+0000 W CONTROL [initandlisten]
      2018-12-27T19:32:45.700+0000 W CONTROL [initandlisten]
      2018-12-27T19:32:45.700+0000 I CONTROL [initandlisten]
      2018-12-27T19:32:46.230+0000 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/mongodb_data/diagnostic.data'
      2018-12-27T19:32:46.285+0000 I REPL [initandlisten] Rollback ID is 2
      2018-12-27T19:32:46.365+0000 I REPL [initandlisten] Recovering from stable timestamp: Timestamp(1545846063, 4) (top of oplog: \{ ts: Timestamp(1545931903, 1), t: 28 }, appliedTh
      ough: \{ ts: Timestamp(0, 0), t: -1 }, TruncateAfter: Timestamp(0, 0))
      2018-12-27T19:32:46.365+0000 I REPL [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1545846063, 4)
      2018-12-27T19:32:46.365+0000 I REPL [initandlisten] Replaying stored operations from \{ : Timestamp(1545846063, 4) } (exclusive) to \{ : Timestamp(1545931903, 1) } (inclusive).
      2018-12-27T19:32:47.285+0000 I COMMAND [initandlisten] command local.oplog.rs command: find \{ find: "oplog.rs", filter: { ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: tr
      e, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:101 numYields:15 nreturned:101 reslen:4577118 locks:{ Global: { acquireCount: { r: 79, w: 3
      , W: 3 } }, Database: \{ acquireCount: { r: 28, w: 1, R: 4, W: 32 } }, Collection: \{ acquireCount: { r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 17 } } } protocol:op_msg 920ms
      2018-12-27T19:32:48.712+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
      find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:943 n
      mYields:30 nreturned:942 reslen:16773476 locks:\{ Global: { acquireCount: { r: 111, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 60, w: 1, R: 4, W: 32 } }, Collection: { acquireC
      unt: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 49 } } } protocol:op_msg 1424ms
      2018-12-27T19:32:51.643+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
      find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:575 n
      mYields:48 nreturned:575 reslen:15625889 locks:\{ Global: { acquireCount: { r: 160, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 109, w: 1, R: 4, W: 32 } }, Collection: { acquire
      ount: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 98 } } } protocol:op_msg 2923ms
      2018-12-27T19:32:52.761+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
      find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:90 nu
      Yields:20 nreturned:90 reslen:16699954 locks:\{ Global: { acquireCount: { r: 181, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 130, w: 1, R: 4, W: 32 } }, Collection: { acquireCo
      nt: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 119 } } } protocol:op_msg 1112ms
      2018-12-27T19:32:55.122+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
      find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:125 n
      mYields:33 nreturned:125 reslen:16715891 locks:\{ Global: { acquireCount: { r: 215, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 164, w: 1, R: 4, W: 32 } }, Collection: { acquire
      ount: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 153 } } } protocol:op_msg 2358ms
      2018-12-27T19:32:58.084+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
      find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:142 n
      mYields:43 nreturned:142 reslen:16532193 locks:\{ Global: { acquireCount: { r: 259, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 208, w: 1, R: 4, W: 32 } }, Collection: { acquire
      ount: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 197 } } } protocol:op_msg 2958ms
      2018-12-27T19:33:00.056+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
      find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:135 n
      mYields:35 nreturned:135 reslen:16324186 locks:\{ Global: { acquireCount: { r: 295, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 244, w: 1, R: 4, W: 32 } }, Collection: { acquire
      ount: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 233 } } } protocol:op_msg 1967ms
      2018-12-27T19:33:01.212+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
      find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:81 nu
      Yields:21 nreturned:81 reslen:16450786 locks:\{ Global: { acquireCount: { r: 317, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 266, w: 1, R: 4, W: 32 } }, Collection: { acquireCo
      nt: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 255 } } } protocol:op_msg 1151ms
      

            Assignee:
            daniel.hatcher@mongodb.com Danny Hatcher (Inactive)
            Reporter:
            dharshanr@scalegrid.net Dharshan Rangegowda
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: