[SERVER-32180] mongod oom with low connections Created: 06/Dec/17  Updated: 26/Jan/18  Resolved: 26/Dec/17

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: 3.4.4
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: shawn Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File cursors.png     File diagnostic-new.tar.gz     Text File log.log     Zip Archive mongo-1207.zip     File mongolog.tar.gz    
Issue Links:
Duplicate
duplicates SERVER-22224 $near query uses unbounded memory Backlog
Participants:

 Description   

hi ,

my mongod oom with few connections,

diagnostic.data files is attached,

please help me to view these files。



 Comments   
Comment by Kelsey Schubert [ 26/Dec/17 ]

Hi shawn001,

Thank you for providing the complete logs. From it, we can see that a large amount of memory is being utilized by geonear queries preceeding the oom. This issue is tracked in SERVER-22224. Please feel free to vote for it and watch it for updates.

This stack's allocation begins increasing at 2017-12-06T16:05:29.011Z UTC and grows to above 50GB by the time of OOM.

2017-12-06T14:01:57.005+0800 I -        [ftdc] heapProfile stack1239: { 0: "tc_malloc", 1: "mongo::mongoMalloc", 2: "mongo::BSONObj::copy", 3: "mongo::BSONObj::getOwned", 4: "mongo::WorkingSetMember::makeObjOwnedIfNeeded", 5: "mongo::NearStage::bufferNext", 6: "mongo::NearStage::doWork", 7: "mongo::PlanStage::work", 8: "mongo::ProjectionStage::doWork", 9: "mongo::PlanStage::work", 10: "mongo::LimitStage::doWork", 11: "mongo::PlanStage::work", 12: "mongo::PlanExecutor::getNextImpl", 13: "mongo::PlanExecutor::getNext", 14: "mongo::Geo2dFindNearCmd::run", 15: "mongo::Command::run", 16: "mongo::Command::execCommand", 17: "mongo::runCommands", 18: "0x7f0f1c769392", 19: "mongo::assembleResponse", 20: "mongo::ServiceEntryPointMongod::_sessionLoop", 21: "0x7f0f1c339c2d", 22: "0x7f0f1d047242", 23: "0x7f0f1a128851", 24: "clone" }

Until SERVER-22224 is resolved, I would suggest investigating steps to take on the application layer to mitigate this issue by reducing the number of concurrent geonear queries and ensuring appropriate indexes have been established.

Kind regards,
Kelsey

Comment by shawn [ 16/Dec/17 ]

hi
all

is there anything discovery?

Comment by shawn [ 06/Dec/17 ]

Hi @Kelsey

the entire file named log.log has been uploaded which since the restart to OOM and ensure there are no gaps in its coverage.

the host configure: 128GB Phyiscal mem and 11GB swap.

Comment by Kelsey Schubert [ 06/Dec/17 ]

Hi shawn,

To clarify, we need to see all of the log messages since the restart with heapprofiling is enabled to see the relevant stack traces recorded by the heap profiler. Without these traces, we cannot continue to our investigation. These stacks may not be first recorded at particularly noteworthy times, so it's best to provide the complete files to allow us pull out relevant information as we look into this issue.

Would you please upload all mongod log files since the restart to OOM and ensure there are no gaps in its coverage so we can continue to investigate?

Thank you,
Kelsey

Comment by shawn [ 06/Dec/17 ]

Hi Bruce Lucas

mongo-1207.zip this file of mongodb log was at the time of OOM。

Thanks

Comment by Bruce Lucas (Inactive) [ 06/Dec/17 ]

Hi shawn001,

Thanks for uploading the data. Unfortunately the log file ends at 2017-12-06T12:59:59.850+0800, several hours before the OOM, so it is missing some crucial information for identifying the cause. Do you have additional log files that cover the entire time from the restart until the OOM?

Thanks,
Bruce

Comment by shawn [ 06/Dec/17 ]

hi @thomas.schubert

diagnostic-new.tar.gz this file was diagnostic.data after mongod starts with heapProfilingEnabled=true

mongolog.tar.gz this file was mongodb log after mongod starts with heapProfilingEnabled=true and mongpd was killed by oom killer。

Thank you

Comment by shawn [ 06/Dec/17 ]

hi , @Kelsey T Schubert

got it

thank you

Comment by Kelsey Schubert [ 06/Dec/17 ]

Hi shawn001,

Thanks for reporting this issue. I see, proceeding the OOM events, some cursors are established with noTimeouts. Are you aware of queries these cursors are running?

To help us continue to investigate this issue, would you please restart the node with the following parameter enabled

mongod --setParameter heapProfilingEnabled=true

After encountering the OOM again, would you please upload the following files:

  1. An archive of the diagnostic.data
  2. The complete mongod logs since enabling the heapProfiler

Thank you,
Kelsey

Generated at Thu Feb 08 04:29:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.