[SERVER-31177] MongoDB consumes all free memory, leading to throttled replication Created: 20/Sep/17 Updated: 17/Oct/17 Resolved: 25/Sep/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Max Bennedich | Assignee: | Mark Agarunov |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
We have a small three member replicaset running MongoDB 3.4.1:
The WiredTiger cache has its default size of around 50% of RAM, so the mongod process consumes around 32 GB in our case. Additionally, over time, MongoDB uses up all free memory via the filesystem cache (memory mapping). This is expected behavior AFAIK. However, what we are seeing is that once the amount of available memory on the server drops below 1-4 % (by Windows' definition of "Available memory"), the replication speed from the primary to the secondary instance is being throttled / capped at just over 20 MBit/s. I.e., replication never goes above that speed, and if there is more data to replicate, it will queue up and result in replication lag. This is not a pure bandwidth issue; for example while throttling is taking place, we can transfer data between the two servers over FTP at far more than 20 MBit/s. To prove that low memory is causing this throttling, we ran a small script that allocated and freed around 10 GB memory on the server. Since there was almost no available memory, this memory was reallocated mainly from the filesystem cache, and in part form the mongod process. Immediately the throttling stopped, as shown in the attached screenshot, and replication occurred at full speed. This lasted for around 3 days until MongoDB had consumed all free memory through the filesystem cache, at which point replication was again throttled. This is 100% reproducible, and in fact the workaround we are currently resorting to, to avoid replication lag. We haven't found a way to configure the amount of memory used for memory mapping, and are currently thinking that this must be a bug within MongoDB? We haven't found anything useful in the logs explaining why throttling takes place, and tried looking for duplicates for this bug without success. |
| Comments |
| Comment by Max Bennedich [ 25/Sep/17 ] |
|
The upgrade to 3.4.9 indeed solved the problem! The server memory consumption stays at a constant ~55%. Thanks for looking into this so quickly. |
| Comment by Mark Agarunov [ 21/Sep/17 ] |
|
Hello mbl54, Thank you for the information. I'll leave this ticket open until you can confirm that this has resolved the issue. Thanks, |
| Comment by Max Bennedich [ 21/Sep/17 ] |
|
Thanks for the update! I have upgraded our members to version 3.4.9. In the past, it has takes a few days for all memory to be consumed, so I will need until next week probably before I can tell you whether this solved the problem. |
| Comment by Mark Agarunov [ 20/Sep/17 ] |
|
Hello mbl54, Thank you for providing this information. Looking over this I suspect this may be due to Thanks, |
| Comment by Max Bennedich [ 20/Sep/17 ] |
|
Thanks for looking into this! I am attaching logs and additional screenshots from an event of interest at 2017-09-18 22:40 CEST. By 2017-09-18, we had experienced replication throttling for around 3 weeks nonstop (which is when we last restarted MongoDB). At 2017-09-18 22:26 we started a Java program which slowly allocated 20 GB memory on the primary instance. This memory was all released at 22:38. The throttling stopped immediately (you can see in the graph that we had speeds well above 20 MBit/s already at 22:42). Note: You may see some fluctuations in network traffic in the graph, and change in traffic pattern in the logs, between 2017-09-18 13:10 and 2017-09-18 15:01. During this period we switched the primary over to the AWS instance. Also a note about time zones:
|
| Comment by Mark Agarunov [ 20/Sep/17 ] |
|
Hello mbl54, I've generated a secure upload portal so that you can send us this data privately. Files uploaded to the portal can only be accessed by MongoDB. Thanks, |
| Comment by Mark Agarunov [ 20/Sep/17 ] |
|
Hello mbl54, Thank you for the report. To get a better idea of why the memory usage is limiting replication speed, could you please provide the following:
This should give us some insight into what may be causing this. Thanks, |