[SERVER-38787] Initial sync brings down the primary Created: 27/Dec/18  Updated: 02/Jan/19  Resolved: 31/Dec/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dharshan Rangegowda Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2018-12-28 at 10.54.13 AM.png    
Operating System: ALL
Participants:

 Description   

Hi,

We were attempting to add a new secondary node. As soon as initial sync started it started to cause issues on the primary. The CPU usage on primary increased and it was not accepting any connections. We killed the primary process manually and restarted it. After restart the node is stuck in  "command local.oplog.rs command: getMore" and is not taking any connections. Not sure where the getMore is coming from since we have stopped the node that was trying to sync.

Finally after a while we gave up on this node and switched it standalone node so that we can access the data.

 

2018-12-27T19:31:40.017+0000 I CONTROL [main] ***** SERVER RESTARTED *****
2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] MongoDB starting : pid=21589 port=27017 dbpath=/mongodb_data 64-bit host=vm77a970234e
2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] db version v4.0.3
2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] git version: 7ea530946fa7880364d88c8d8b6026bbc9ffa48c
2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] allocator: tcmalloc
2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] modules: none
2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] build environment:
2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] distmod: rhel62
2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] distarch: x86_64
2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] target_arch: x86_64
2018-12-27T19:31:40.039+0000 I CONTROL [initandlisten] options: \{ config: "/etc/mongod.conf", net: { bindIp: "0.0.0.0", port: 27017 }, processManagement: { fork: true, pidFilePath:
 "/var/run/mongodb/mongod.pid" }, replication: \{ replSetName: "RS-360iDocDB-0" }, security: \{ authorization: "enabled", keyFile: "/var/lib/mongo/rskey" }, setParameter: { oplogIniti
alFindMaxSeconds: "120" }, storage: \{ dbPath: "/mongodb_data", engine: "wiredTiger", journal: { enabled: true }, wiredTiger: \{ collectionConfig: { blockCompressor: "snappy" } } }, s
ystemLog: \{ destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log" } }
2018-12-27T19:31:40.039+0000 W STORAGE [initandlisten] Detected unclean shutdown - /mongodb_data/mongod.lock is not empty.
2018-12-27T19:31:40.160+0000 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
2018-12-27T19:31:40.160+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=7520M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,st
atistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2018-12-27T19:31:41.732+0000 I STORAGE [initandlisten] WiredTiger message [1545939101:732515][21589:0x7f5891565a60], txn-recover: Main recovery loop: starting at 89/15488
2018-12-27T19:31:41.733+0000 I STORAGE [initandlisten] WiredTiger message [1545939101:733003][21589:0x7f5891565a60], txn-recover: Recovering log 89 through 90
2018-12-27T19:31:42.006+0000 I STORAGE [initandlisten] WiredTiger message [1545939102:6115][21589:0x7f5891565a60], file:sizeStorer.wt, txn-recover: Recovering log 90 through 90
2018-12-27T19:31:42.069+0000 I STORAGE [initandlisten] WiredTiger message [1545939102:69748][21589:0x7f5891565a60], file:sizeStorer.wt, txn-recover: Set global recovery timestamp:
5c23bd2f00000004
2018-12-27T19:31:42.145+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1545846063, 4)
2018-12-27T19:31:42.145+0000 I STORAGE [initandlisten] Triggering the first stable checkpoint. Initial Data: Timestamp(1545846063, 4) PrevStable: Timestamp(0, 0) CurrStable: Timest
amp(1545846063, 4)
2018-12-27T19:31:42.979+0000 I STORAGE [initandlisten] Starting OplogTruncaterThread local.oplog.rs
2018-12-27T19:31:42.979+0000 I STORAGE [initandlisten] The size storer reports that the oplog contains 33685 records totaling to 2111680101 bytes
2018-12-27T19:31:43.059+0000 I STORAGE [initandlisten] Sampling from the oplog between Dec 26 14:43:17:55 and Dec 27 17:31:43:1 to determine where to place markers for truncation
2018-12-27T19:31:43.059+0000 I STORAGE [initandlisten] Taking 990 samples and assuming that each section of oplog contains approximately 340 records totaling to 21314271 bytes
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:18:12
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:18:199
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:19:123
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:20:13
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:22:38
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:22:225
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:30:69
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:31:121
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:33:156
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:34:107
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:36:91
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:37:20
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:37:209
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:38:130
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:39:92
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:40:30
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:41:35
2018-12-27T19:32:42.336+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 26 14:43:43:37
......
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:41:47:111
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:41:49:52
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:41:51:58
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:41:53:59
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:22:27
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:34:89
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:36:15
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:39:10
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:41:56
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:44:11
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:45:59
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:47:42
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:53:114
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:42:55:42
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:43:39:10
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:43:44:14
2018-12-27T19:32:42.337+0000 I STORAGE [initandlisten] Placing a marker at optime Dec 27 14:44:04:3
2018-12-27T19:32:45.700+0000 W CONTROL [initandlisten]
2018-12-27T19:32:45.700+0000 W CONTROL [initandlisten]
2018-12-27T19:32:45.700+0000 I CONTROL [initandlisten]
2018-12-27T19:32:46.230+0000 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/mongodb_data/diagnostic.data'
2018-12-27T19:32:46.285+0000 I REPL [initandlisten] Rollback ID is 2
2018-12-27T19:32:46.365+0000 I REPL [initandlisten] Recovering from stable timestamp: Timestamp(1545846063, 4) (top of oplog: \{ ts: Timestamp(1545931903, 1), t: 28 }, appliedTh
ough: \{ ts: Timestamp(0, 0), t: -1 }, TruncateAfter: Timestamp(0, 0))
2018-12-27T19:32:46.365+0000 I REPL [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1545846063, 4)
2018-12-27T19:32:46.365+0000 I REPL [initandlisten] Replaying stored operations from \{ : Timestamp(1545846063, 4) } (exclusive) to \{ : Timestamp(1545931903, 1) } (inclusive).
2018-12-27T19:32:47.285+0000 I COMMAND [initandlisten] command local.oplog.rs command: find \{ find: "oplog.rs", filter: { ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: tr
e, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:101 numYields:15 nreturned:101 reslen:4577118 locks:{ Global: { acquireCount: { r: 79, w: 3
, W: 3 } }, Database: \{ acquireCount: { r: 28, w: 1, R: 4, W: 32 } }, Collection: \{ acquireCount: { r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 17 } } } protocol:op_msg 920ms
2018-12-27T19:32:48.712+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:943 n
mYields:30 nreturned:942 reslen:16773476 locks:\{ Global: { acquireCount: { r: 111, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 60, w: 1, R: 4, W: 32 } }, Collection: { acquireC
unt: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 49 } } } protocol:op_msg 1424ms
2018-12-27T19:32:51.643+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:575 n
mYields:48 nreturned:575 reslen:15625889 locks:\{ Global: { acquireCount: { r: 160, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 109, w: 1, R: 4, W: 32 } }, Collection: { acquire
ount: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 98 } } } protocol:op_msg 2923ms
2018-12-27T19:32:52.761+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:90 nu
Yields:20 nreturned:90 reslen:16699954 locks:\{ Global: { acquireCount: { r: 181, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 130, w: 1, R: 4, W: 32 } }, Collection: { acquireCo
nt: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 119 } } } protocol:op_msg 1112ms
2018-12-27T19:32:55.122+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:125 n
mYields:33 nreturned:125 reslen:16715891 locks:\{ Global: { acquireCount: { r: 215, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 164, w: 1, R: 4, W: 32 } }, Collection: { acquire
ount: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 153 } } } protocol:op_msg 2358ms
2018-12-27T19:32:58.084+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:142 n
mYields:43 nreturned:142 reslen:16532193 locks:\{ Global: { acquireCount: { r: 259, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 208, w: 1, R: 4, W: 32 } }, Collection: { acquire
ount: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 197 } } } protocol:op_msg 2958ms
2018-12-27T19:33:00.056+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:135 n
mYields:35 nreturned:135 reslen:16324186 locks:\{ Global: { acquireCount: { r: 295, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 244, w: 1, R: 4, W: 32 } }, Collection: { acquire
ount: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 233 } } } protocol:op_msg 1967ms
2018-12-27T19:33:01.212+0000 I COMMAND [initandlisten] command local.oplog.rs command: getMore \{ getMore: 28802248476, collection: "oplog.rs", $db: "local" } originatingCommand: {
find: "oplog.rs", filter: \{ ts: { $gte: Timestamp(1545846063, 4) } }, oplogReplay: true, $db: "local" } planSummary: COLLSCAN cursorid:28802248476 keysExamined:0 docsExamined:81 nu
Yields:21 nreturned:81 reslen:16450786 locks:\{ Global: { acquireCount: { r: 317, w: 33, W: 3 } }, Database: \{ acquireCount: { r: 266, w: 1, R: 4, W: 32 } }, Collection: { acquireCo
nt: \{ r: 11, w: 1 } }, oplog: \{ acquireCount: { r: 255 } } } protocol:op_msg 1151ms



 Comments   
Comment by Dharshan Rangegowda [ 28/Dec/18 ]

I don't have the 3.x logs handy - but will reactivate this ticket if I run into this again. For now you can consider this closed. Thank you for the quick and detailed response.

Comment by Danny Hatcher (Inactive) [ 28/Dec/18 ]

Hello Dharshan,

1) Unfortunately, there is not a publicly available version of the tool. However, the diagnostic.data results are simply a collection of commands that you can run yourself. In this case, the replica set information came from rs.status() run via the mongo shell while the disk information came from cat /proc/diskstats run against the server.

2) Those commands are local oplog commands as the node goes through its recovery process and are not commands prompted by another nodes initial sync. They are appearing because the disk is at consistent 100% utilization as the Primary tries to recover and fully start up. Improving the disk will likely cause those messages to disappear as well.

Do you have logs and diagnostic.data of a 3.x version of MongoDB going through the same scenario? It is possible that there is something that could have changed server-side to increase disk usage. Regardless, I think improving the underlying disks will be beneficial.

Thank you,

Danny

Comment by Dharshan Rangegowda [ 28/Dec/18 ]

Thanks Danny. A few questions

1 ) Is there a publicly available tool to extract info from the diagnostics so that we can self diagnose?

2) What were the oplog.getMore command running on the primary on restart when the secondary node had gone away?

Also we have run this scenario many times with 3.x version and had no issues (The data size is 3GB and oplog is 2GB). I'm wondering why we have this issue with 4.0. Maybe some variance in throughput of the cloud disks?

 

 

Comment by Danny Hatcher (Inactive) [ 28/Dec/18 ]

Hello,

I see that the Primary node only had 1925MB of RAM allocated until it was configured for 16065MB at 2018-12-27T14:20 UTC. Any crashes before that timestamp are due to insufficient RAM but it appears that you diagnosed that issue.

However, removing the RAM bottleneck revealed a disk bottleneck. Here's a graph of the initial sync attempt's impact on the Primary between markers A and B:

The disk usage is at a consistent 100%. Please upgrade the backing disks on the Primary before attempting another initial sync. If it fails again, please provide new mongod logs and diagnostic.data.

Thank you,

Danny

Comment by Dharshan Rangegowda [ 28/Dec/18 ]

Files have been upload on secure portal. Diagnostic data for primary is included. You will see several restarts. Finally we moved it to standalone node and after that we were able to connect

Comment by Danny Hatcher (Inactive) [ 27/Dec/18 ]

Hello Dharshan,

You can either upload the files directly to this JIRA case or you can upload them to our Secure Portal. Please note that the former will be visible to the world but the latter will only be visible to MongoDB Engineers.

Thank you,

Danny

Comment by Dharshan Rangegowda [ 27/Dec/18 ]

Hi Danny,

The primary was not taking any more connections - this is reason we concluded its not responsive.

We do have a snapshot of the primary data in this state. We can also get the logs before we switched the primary to standalone mode. Where do we upload the data?

 

Comment by Danny Hatcher (Inactive) [ 27/Dec/18 ]

Hello,

Initial syncs do add extra load on the sync source; we have seen scenarios where that added load was enough to overload the sync source thus the behavior you have encountered is not unexpected. Please note that MongoDB will only log commands that take longer than 100ms by default. The absence of log entries after the getMore does not mean that the node froze; it is very possible that any successive queries simply took less than 100ms to succeed.

If you'd like me to confirm the situation, please provide the mongod logs and "diagnostic.data" folder from both the sync source and the new node.

Thank you,

Danny

Comment by Dharshan Rangegowda [ 27/Dec/18 ]

MongODB Version is 4.0.3

Generated at Thu Feb 08 04:50:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.