Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-26979

Degraded performance in 3.2.10

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.5, 3.2.10
    • Labels:
      None
    • ALL
    • Hide

      Upgrade MongoDB to 3.2.10 from 3.2.1, or Upgrade MongoDB to 3.2.5 from 3.2.1

      Show
      Upgrade MongoDB to 3.2.10 from 3.2.1, or Upgrade MongoDB to 3.2.5 from 3.2.1

      In version 3.2.10 we had major replication and perfomance problems.

      Symptoms:

      • Replication too slow to get in sync
      • Overall poor performance
      Details:

      After upgrading from 3.2.1 to 3.2.10 we encountered massive replication problems (oplog last event time was changing very slow, seconds behind master increasing).
      We have a simple 2-node Replication Setup (+ 1 arbiter).
      We updated mongo slave first and the repliction was still fine. (Slave: 3.2.10, Master: 3.2.1)
      After switching slave to master we encountered above described problems. (Slave: 3.2.1, Master: 3.2.10) We upgraded slave and it was not getting better. (Slave: 3.2.10, Master: 3.2.10)
      We encountered similar problems with 3.2.5. some month ago.
      After searching the bug reports we found SERVER-24775 which sounds familiar.

      Workload:

      • we have 150-170GB oplog per hour
      • insert Statements between 800-20.000 per minute
      • read 70-600 per minute.
      cat /etc/mongod.conf
      # mongod.conf
      
      storage:
        dbPath: /mongo/mongodb
        engine: wiredTiger
      
      systemLog:
        destination: file
        logAppend: true
        path: /mongo/log/mongod.log
        quiet: true
      
      replication:
        replSetName: rs0
        oplogSizeMB: 1024000
      
      Node Hardware:
      RAM: 378G
      
      Storage: SSD
      
      Architecture:               x86_64
      CPU op-mode(s):       32-bit, 64-bit
      Byte Order:                Little Endian
      CPU(s):                      40
      On-line CPU(s) list:    0-39
      Thread(s) per core:    2
      Core(s) per socket:    10
      Socket(s):                   2
      NUMA node(s):          2
      Vendor ID:                  GenuineIntel
      CPU family:                6
      Model:                        62
      Stepping:                   4
      CPU MHz:                 2493.704
      BogoMIPS:                4988.68
      Virtualization:             VT-x
      L1d cache:                32K
      L1i cache:                 32K
      L2 cache:                  256K
      L3 cache:                  25600K
      NUMA node0 CPU(s):     0-9,20-29
      NUMA node1 CPU(s):     10-19,30-39}}
      
      Problem:
      2016-11-09T10:18:48.657+0100 I COMMAND  [conn24795] command local.oplog.rs command: getMore { getMore: 14390826614, collection: "oplog.rs", maxTimeMS: 5000, term: 193, lastKnownCommittedOpTime: { ts: Timestamp 1478679650000|537, t: 193 } } planSummary: COLLSCAN cursorid:14390826614 keysExamined:0 docsExamined:612 keyUpdates:0 writeConflicts:0 numYields:21 nreturned:612 reslen:4196209 locks:{ Global: { acquireCount: { r: 44 } }, Database: { acquireCount: { r: 22 } }, oplog: { acquireCount: { r: 22 } } } protocol:op_command 4281ms
      2016-11-09T10:18:51.886+0100 I COMMAND  [conn24795] command local.oplog.rs command: getMore { getMore: 14390826614, collection: "oplog.rs", maxTimeMS: 5000, term: 193, lastKnownCommittedOpTime: { ts: Timestamp 1478679650000|1102, t: 193 } } planSummary: COLLSCAN cursorid:14390826614 keysExamined:0 docsExamined:600 keyUpdates:0 writeConflicts:0 numYields:21 nreturned:600 reslen:4197017 locks:{ Global: { acquireCount: { r: 44 } }, Database: { acquireCount: { r: 22 } }, oplog: { acquireCount: { r: 22 } } } protocol:op_command 3220ms
      2016-11-09T10:18:57.626+0100 I COMMAND  [conn24795] command local.oplog.rs command: getMore { getMore: 14390826614, collection: "oplog.rs", maxTimeMS: 5000, term: 193, lastKnownCommittedOpTime: { ts: Timestamp 1478679650000|1714, t: 193 } } planSummary: COLLSCAN cursorid:14390826614 keysExamined:0 docsExamined:343 keyUpdates:0 writeConflicts:0 numYields:15 nreturned:343 reslen:2421780 locks:{ Global: { acquireCount: { r: 32 } }, Database: { acquireCount: { r: 16 } }, oplog: { acquireCount: { r: 16 } } } protocol:op_command 5723ms
      2016-11-09T10:19:01.426+0100 I COMMAND  [conn24795] command local.oplog.rs command: getMore { getMore: 14390826614, collection: "oplog.rs", maxTimeMS: 5000, term: 193, lastKnownCommittedOpTime: { ts: Timestamp 1478679650000|2314, t: 193 } } planSummary: COLLSCAN cursorid:14390826614 keysExamined:0 docsExamined:520 keyUpdates:0 writeConflicts:0 numYields:14 nreturned:520 reslen:4200801 locks:{ Global: { acquireCount: { r: 30 } }, Database: { acquireCount: { r: 15 } }, oplog: { acquireCount: { r: 15 } } } protocol:op_command 3794ms
      2016-11-09T10:19:02.278+0100 I COMMAND  [conn24795] command local.oplog.rs command: getMore { getMore: 14390826614, collection: "oplog.rs", maxTimeMS: 5000, term: 193, lastKnownCommittedOpTime: { ts: Timestamp 1478679651000|195, t: 193 } } planSummary: COLLSCAN cursorid:14390826614 keysExamined:0 docsExamined:606 keyUpdates:0 writeConflicts:0 numYields:11 nreturned:606 reslen:4198613 locks:{ Global: { acquireCount: { r: 24 } }, Database: { acquireCount: { r: 12 } }, oplog: { acquireCount: { r: 12 } } } protocol:op_command 828ms
      2016-11-09T10:19:07.764+0100 I COMMAND  [conn24795] command local.oplog.rs command: getMore { getMore: 14390826614, collection: "oplog.rs", maxTimeMS: 5000, term: 193, lastKnownCommittedOpTime: { ts: Timestamp 1478679651000|715, t: 193 } } planSummary: COLLSCAN cursorid:14390826614 keysExamined:0 docsExamined:467 keyUpdates:0 writeConflicts:0 numYields:17 nreturned:467 reslen:2913761 locks:{ Global: { acquireCount: { r: 36 } }, Database: { acquireCount: { r: 18 } }, oplog: { acquireCount: { r: 18 } } } protocol:op_command 5470ms }}
      
      Normal:
      2016-11-10T13:09:48.011+0100 I COMMAND  [conn41400] command local.oplog.rs command: getMore { getMore: 16956787344741, collection: "oplog.rs", maxTimeMS: 5000, term: 194, lastKnownCommittedOpTime: { ts: Timestamp 1478779752000|14364, t: 194 } } cursorid:16956787344741 keyUpdates:0 writeConflicts:0 numYields:6 nreturned:638 reslen:4199468 locks:{ Global: { acquireCount: { r: 14 } }, Database: { acquireCount: { r: 7 } }, oplog: { acquireCount: { r: 7 } } } protocol:op_command 135ms
      2016-11-10T13:09:48.860+0100 I COMMAND  [conn41400] command local.oplog.rs command: getMore { getMore: 16956787344741, collection: "oplog.rs", maxTimeMS: 5000, term: 194, lastKnownCommittedOpTime: { ts: Timestamp 1478779753000|7831, t: 194 } } cursorid:16956787344741 keyUpdates:0 writeConflicts:0 numYields:12 nreturned:1571 reslen:4195266 locks:{ Global: { acquireCount: { r: 26 } }, Database: { acquireCount: { r: 13 } }, oplog: { acquireCount: { r: 13 } } } protocol:op_command 206ms
      2016-11-10T13:09:49.302+0100 I COMMAND  [conn41400] command local.oplog.rs command: getMore { getMore: 16956787344741, collection: "oplog.rs", maxTimeMS: 5000, term: 194, lastKnownCommittedOpTime: { ts: Timestamp 1478779753000|20919, t: 194 } } cursorid:16956787344741 keyUpdates:0 writeConflicts:0 numYields:10 nreturned:932 reslen:4197637 locks:{ Global: { acquireCount: { r: 22 } }, Database: { acquireCount: { r: 11 } }, oplog: { acquireCount: { r: 11 } } } protocol:op_command 397ms
      2016-11-10T13:09:49.604+0100 I COMMAND  [conn41400] command local.oplog.rs command: getMore { getMore: 16956787344741, collection: "oplog.rs", maxTimeMS: 5000, term: 194, lastKnownCommittedOpTime: { ts: Timestamp 1478779753000|30727, t: 194 } } cursorid:16956787344741 keyUpdates:0 writeConflicts:0 numYields:11 nreturned:1302 reslen:4198994 locks:{ Global: { acquireCount: { r: 24 } }, Database: { acquireCount: { r: 12 } }, oplog: { acquireCount: { r: 12 } } } protocol:op_command 120ms
      2016-11-10T13:09:49.971+0100 I COMMAND  [conn41400] command local.oplog.rs command: getMore { getMore: 16956787344741, collection: "oplog.rs", maxTimeMS: 5000, term: 194, lastKnownCommittedOpTime: { ts: Timestamp 1478779754000|8261, t: 194 } } cursorid:16956787344741 keyUpdates:0 writeConflicts:0 numYields:8 nreturned:828 reslen:4196761 locks:{ Global: { acquireCount: { r: 18 } }, Database: { acquireCount: { r: 9 } }, oplog: { acquireCount: { r: 9 } } } protocol:op_command 180ms
      2016-11-10T13:09:50.654+0100 I COMMAND  [conn41400] command local.oplog.rs command: getMore { getMore: 16956787344741, collection: "oplog.rs", maxTimeMS: 5000, term: 194, lastKnownCommittedOpTime: { ts: Timestamp 1478779754000|18849, t: 194 } } cursorid:16956787344741 keyUpdates:0 writeConflicts:0 numYields:9 nreturned:473 reslen:4201156 locks:{ Global: { acquireCount: { r: 20 } }, Database: { acquireCount: { r: 10 } }, oplog: { acquireCount: { r: 10 } } } protocol:op_command 341ms
      

      Replication lag graph:

        1. eviction-1.png
          eviction-1.png
          329 kB
        2. replication_lag.png
          replication_lag.png
          13 kB

            Assignee:
            kelsey.schubert@mongodb.com Kelsey Schubert
            Reporter:
            ubik Arndt Königsfeld
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: