-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.2.5, 3.2.10
-
Component/s: Performance, Replication, WiredTiger
-
Labels:None
-
ALL
-
In version 3.2.10 we had major replication and perfomance problems.
Symptoms:
- Replication too slow to get in sync
- Overall poor performance
Details:
After upgrading from 3.2.1 to 3.2.10 we encountered massive replication problems (oplog last event time was changing very slow, seconds behind master increasing).
We have a simple 2-node Replication Setup (+ 1 arbiter).
We updated mongo slave first and the repliction was still fine. (Slave: 3.2.10, Master: 3.2.1)
After switching slave to master we encountered above described problems. (Slave: 3.2.1, Master: 3.2.10) We upgraded slave and it was not getting better. (Slave: 3.2.10, Master: 3.2.10)
We encountered similar problems with 3.2.5. some month ago.
After searching the bug reports we found SERVER-24775 which sounds familiar.
Workload:
- we have 150-170GB oplog per hour
- insert Statements between 800-20.000 per minute
- read 70-600 per minute.
cat /etc/mongod.conf # mongod.conf storage: dbPath: /mongo/mongodb engine: wiredTiger systemLog: destination: file logAppend: true path: /mongo/log/mongod.log quiet: true replication: replSetName: rs0 oplogSizeMB: 1024000
Node Hardware: RAM: 378G Storage: SSD Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Stepping: 4 CPU MHz: 2493.704 BogoMIPS: 4988.68 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0-9,20-29 NUMA node1 CPU(s): 10-19,30-39}}
Problem:
2016-11-09T10:18:48.657+0100 I COMMAND [conn24795] command local.oplog.rs command: getMore { getMore: 14390826614, collection: "oplog.rs", maxTimeMS: 5000, term: 193, lastKnownCommittedOpTime: { ts: Timestamp 1478679650000|537, t: 193 } } planSummary: COLLSCAN cursorid:14390826614 keysExamined:0 docsExamined:612 keyUpdates:0 writeConflicts:0 numYields:21 nreturned:612 reslen:4196209 locks:{ Global: { acquireCount: { r: 44 } }, Database: { acquireCount: { r: 22 } }, oplog: { acquireCount: { r: 22 } } } protocol:op_command 4281ms 2016-11-09T10:18:51.886+0100 I COMMAND [conn24795] command local.oplog.rs command: getMore { getMore: 14390826614, collection: "oplog.rs", maxTimeMS: 5000, term: 193, lastKnownCommittedOpTime: { ts: Timestamp 1478679650000|1102, t: 193 } } planSummary: COLLSCAN cursorid:14390826614 keysExamined:0 docsExamined:600 keyUpdates:0 writeConflicts:0 numYields:21 nreturned:600 reslen:4197017 locks:{ Global: { acquireCount: { r: 44 } }, Database: { acquireCount: { r: 22 } }, oplog: { acquireCount: { r: 22 } } } protocol:op_command 3220ms 2016-11-09T10:18:57.626+0100 I COMMAND [conn24795] command local.oplog.rs command: getMore { getMore: 14390826614, collection: "oplog.rs", maxTimeMS: 5000, term: 193, lastKnownCommittedOpTime: { ts: Timestamp 1478679650000|1714, t: 193 } } planSummary: COLLSCAN cursorid:14390826614 keysExamined:0 docsExamined:343 keyUpdates:0 writeConflicts:0 numYields:15 nreturned:343 reslen:2421780 locks:{ Global: { acquireCount: { r: 32 } }, Database: { acquireCount: { r: 16 } }, oplog: { acquireCount: { r: 16 } } } protocol:op_command 5723ms 2016-11-09T10:19:01.426+0100 I COMMAND [conn24795] command local.oplog.rs command: getMore { getMore: 14390826614, collection: "oplog.rs", maxTimeMS: 5000, term: 193, lastKnownCommittedOpTime: { ts: Timestamp 1478679650000|2314, t: 193 } } planSummary: COLLSCAN cursorid:14390826614 keysExamined:0 docsExamined:520 keyUpdates:0 writeConflicts:0 numYields:14 nreturned:520 reslen:4200801 locks:{ Global: { acquireCount: { r: 30 } }, Database: { acquireCount: { r: 15 } }, oplog: { acquireCount: { r: 15 } } } protocol:op_command 3794ms 2016-11-09T10:19:02.278+0100 I COMMAND [conn24795] command local.oplog.rs command: getMore { getMore: 14390826614, collection: "oplog.rs", maxTimeMS: 5000, term: 193, lastKnownCommittedOpTime: { ts: Timestamp 1478679651000|195, t: 193 } } planSummary: COLLSCAN cursorid:14390826614 keysExamined:0 docsExamined:606 keyUpdates:0 writeConflicts:0 numYields:11 nreturned:606 reslen:4198613 locks:{ Global: { acquireCount: { r: 24 } }, Database: { acquireCount: { r: 12 } }, oplog: { acquireCount: { r: 12 } } } protocol:op_command 828ms 2016-11-09T10:19:07.764+0100 I COMMAND [conn24795] command local.oplog.rs command: getMore { getMore: 14390826614, collection: "oplog.rs", maxTimeMS: 5000, term: 193, lastKnownCommittedOpTime: { ts: Timestamp 1478679651000|715, t: 193 } } planSummary: COLLSCAN cursorid:14390826614 keysExamined:0 docsExamined:467 keyUpdates:0 writeConflicts:0 numYields:17 nreturned:467 reslen:2913761 locks:{ Global: { acquireCount: { r: 36 } }, Database: { acquireCount: { r: 18 } }, oplog: { acquireCount: { r: 18 } } } protocol:op_command 5470ms }}
Normal:
2016-11-10T13:09:48.011+0100 I COMMAND [conn41400] command local.oplog.rs command: getMore { getMore: 16956787344741, collection: "oplog.rs", maxTimeMS: 5000, term: 194, lastKnownCommittedOpTime: { ts: Timestamp 1478779752000|14364, t: 194 } } cursorid:16956787344741 keyUpdates:0 writeConflicts:0 numYields:6 nreturned:638 reslen:4199468 locks:{ Global: { acquireCount: { r: 14 } }, Database: { acquireCount: { r: 7 } }, oplog: { acquireCount: { r: 7 } } } protocol:op_command 135ms 2016-11-10T13:09:48.860+0100 I COMMAND [conn41400] command local.oplog.rs command: getMore { getMore: 16956787344741, collection: "oplog.rs", maxTimeMS: 5000, term: 194, lastKnownCommittedOpTime: { ts: Timestamp 1478779753000|7831, t: 194 } } cursorid:16956787344741 keyUpdates:0 writeConflicts:0 numYields:12 nreturned:1571 reslen:4195266 locks:{ Global: { acquireCount: { r: 26 } }, Database: { acquireCount: { r: 13 } }, oplog: { acquireCount: { r: 13 } } } protocol:op_command 206ms 2016-11-10T13:09:49.302+0100 I COMMAND [conn41400] command local.oplog.rs command: getMore { getMore: 16956787344741, collection: "oplog.rs", maxTimeMS: 5000, term: 194, lastKnownCommittedOpTime: { ts: Timestamp 1478779753000|20919, t: 194 } } cursorid:16956787344741 keyUpdates:0 writeConflicts:0 numYields:10 nreturned:932 reslen:4197637 locks:{ Global: { acquireCount: { r: 22 } }, Database: { acquireCount: { r: 11 } }, oplog: { acquireCount: { r: 11 } } } protocol:op_command 397ms 2016-11-10T13:09:49.604+0100 I COMMAND [conn41400] command local.oplog.rs command: getMore { getMore: 16956787344741, collection: "oplog.rs", maxTimeMS: 5000, term: 194, lastKnownCommittedOpTime: { ts: Timestamp 1478779753000|30727, t: 194 } } cursorid:16956787344741 keyUpdates:0 writeConflicts:0 numYields:11 nreturned:1302 reslen:4198994 locks:{ Global: { acquireCount: { r: 24 } }, Database: { acquireCount: { r: 12 } }, oplog: { acquireCount: { r: 12 } } } protocol:op_command 120ms 2016-11-10T13:09:49.971+0100 I COMMAND [conn41400] command local.oplog.rs command: getMore { getMore: 16956787344741, collection: "oplog.rs", maxTimeMS: 5000, term: 194, lastKnownCommittedOpTime: { ts: Timestamp 1478779754000|8261, t: 194 } } cursorid:16956787344741 keyUpdates:0 writeConflicts:0 numYields:8 nreturned:828 reslen:4196761 locks:{ Global: { acquireCount: { r: 18 } }, Database: { acquireCount: { r: 9 } }, oplog: { acquireCount: { r: 9 } } } protocol:op_command 180ms 2016-11-10T13:09:50.654+0100 I COMMAND [conn41400] command local.oplog.rs command: getMore { getMore: 16956787344741, collection: "oplog.rs", maxTimeMS: 5000, term: 194, lastKnownCommittedOpTime: { ts: Timestamp 1478779754000|18849, t: 194 } } cursorid:16956787344741 keyUpdates:0 writeConflicts:0 numYields:9 nreturned:473 reslen:4201156 locks:{ Global: { acquireCount: { r: 20 } }, Database: { acquireCount: { r: 10 } }, oplog: { acquireCount: { r: 10 } } } protocol:op_command 341ms