[SERVER-48400] Secondary node memory arise while balancer doing work Created: 26/May/20  Updated: 01/Jun/20  Resolved: 01/Jun/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.13
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Cen Zheng Assignee: Dmitry Agranat
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File SERVER-48400.png     File diagnostic.data.tar.gz     PNG File inserts.png     File mongod.log.tar.gz     PNG File planCacheTotalSizeEstimateBytes.png    
Issue Links:
Duplicate
is duplicated by SERVER-40361 Reduce memory footprint of plan cache... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Hi,

We have an sharding instance configured a balancer window for 3:00am - 6:00am(GMT+8) and we observed that while balancer is doing work, all shard's secondary nodes' memory would arise. Such things would not happen on primary nodes. This may be related with the fact that all query requests are on secondary nodes. I have enabled one shard's heapProfiling and uploaded FTDC data and related logs. Please help identify this issue. Thanks!



 Comments   
Comment by Dmitry Agranat [ 01/Jun/20 ]

You are welcome mingyan.zc@gmail.com. I will go ahead and link this ticket to SERVER-40361.

Comment by Cen Zheng [ 01/Jun/20 ]

Hi, Dima,

We found that planCache was indeed the root cause. They were too many queryShapes occupying it. Had asked user to optimize the business. Thanks a lot!

Comment by Dmitry Agranat [ 31/May/20 ]

Hi mingyan.zc@gmail.com,

After looking at a larger window, May 22-25, I can see a memory growth of ~4.3 GB which is related to the internalQueryCacheSize. This issue is fixed in SERVER-40361 and the mitigation strategy is to lower the internalQueryCacheSize.

The above includes the first 10 stacks which have the same shape as the planCacheTotalSizeEstimateBytes parameter.

Thanks,
Dima

Comment by Cen Zheng [ 27/May/20 ]

Hi, Dima, 

Thanks for the analyst. It looks like the increased resident memory is being gradually reclaimed, but from a longer point of view(from 5-22 to 5-25). It has increased by ~5 GB and continue to growth. There is also no such big memory increase in the Hidden node, also having the replicated inserts. So it may be concerned with user's query requests. I wonder if there are some hidden problems causing this.

Comment by Dmitry Agranat [ 27/May/20 ]

Hi mingyan.zc@gmail.com, secondary memory arise during this time because there are replicated insert operations going through during this window.

During this time, the resident memory is being increased by ~2 GB. After that the memory is being gradually reclaimed. I think this is an expected behavior.

Thanks,
Dima

Generated at Thu Feb 08 05:17:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.