[SERVER-63402] High query response time for find operation in mongo 4.0.27 with mmap storage engine with random intervals (5/7/12/20 hours) Created: 08/Feb/22 Updated: 28/Feb/22 Resolved: 28/Feb/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.0.27 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | KAPIL GUPTA | Assignee: | Edwin Zhou |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
Scenario - We have application which is doing CRUD operations on database. From past 2 years we are using mongo 3.6.9 and system is working fine without any issues but when we upgraded the database from 3.6.9 to 4.0.27 and then we started getting high query response time for find operation but it is not consistent in nature ,sometimes we are getting high query response in 5 hours or 7 hours or even 11 hours. |
| Comments |
| Comment by Edwin Zhou [ 28/Feb/22 ] | |
|
Hi kg3634@gmail.com, Thank you for following up that you've experience this issue more frequently after applying the patch. I will now close this ticket as a duplicate of Best, | |
| Comment by KAPIL GUPTA [ 25/Feb/22 ] | |
|
Hi Edwin, In our case after applying patch (rebuilding and reinstalling same package with fix) for glibc 2.27, timeouts frequency got increased and for ubuntu 18.04 we could not be able to downgrade/upgrade glibc to different version as it is requiring other dependency to downgrade/upgrade in parallel that could disturb other running services. could we reduce the value for maxIdleThreadAge, if yes then what could be the minimum recommendation for the same?
Thanks, Kapil
| |
| Comment by Edwin Zhou [ 24/Feb/22 ] | |
|
Hi kg3634@gmail.com, Thank you for your patience. This appears to be the same bug with glibc 2.27 that According to the thread that you linked, the glibc patch you applied for 2.27 doesn't fix this bug, but reduces the likelihood of it happening on Ubuntu 18.04. Have you observed that the hangs occur less frequently after applying the patch? You may be able to work around this by updating or downgrading glibc to a version that is not affected by this bug. Best, | |
| Comment by Edwin Zhou [ 23/Feb/22 ] | |
|
Hi kg3634@gmail.com, Thank you for uploading the mongod.log files and FTDC. This gives us good context around the gdb samples. Best, | |
| Comment by KAPIL GUPTA [ 23/Feb/22 ] | |
|
Hi Edward, I have uploaded mongologs on support uploader for unresponsive node, Please find the details given below: File Name: mongolog20_02_2022_13_25.tar.gz Unresponsive time duration - from 2022-02-20_13-25-11 to 2022-02-20_13-25-40 UTC Thanks, | |
| Comment by Edwin Zhou [ 22/Feb/22 ] | |
|
Hi kg3634@gmail.com, Thank you for uploading the gdb output. Can you please also upload an archive (tar or zip) of the mongod.log files and the $dbpath/diagnostic.data directory covering the timestamps where you collected stack traces? It will aid in the investigation. In the meantime, I will take a look at the stack traces. Best, | |
| Comment by KAPIL GUPTA [ 22/Feb/22 ] | |
|
Hi Edwin, Gentle Reminder! It is just a gentle reminder for any WA as this issue is badly impacting our product and we have to validate this release to move further.
Thanks, Kapil | |
| Comment by KAPIL GUPTA [ 20/Feb/22 ] | |
|
Hi Edwin, I have uploaded gdboutput on support uploader for unresponsive node, Please find the details given below: File Name: gdboutput.tar.gz Unresponsive time duration: from 2022-02-20_13-25-11 to 2022-02-20_13-25-40 UTC Files list of gdboutput for 5 timestamps is given below: gdboutput: Please let me know if any other information is required. Thanks, | |
| Comment by Edwin Zhou [ 17/Feb/22 ] | |
|
Hi kg3634@gmail.com, Thank you for the update. Unfortunately, I'm not able to provide any workarounds since there's not enough information about the issue so far. I look forward to receiving the gdb output. Best, | |
| Comment by KAPIL GUPTA [ 15/Feb/22 ] | |
|
Hi Edwin, Thank you for the analysis. As part of 4.0.26 , MinThreadCount value is showing zero as given below: rs-app_shardAB-ipv6-5:SECONDARY> db.adminCommand({getParameter:1 ,replWriterMinThreadCount: 1}) }} For glibc 2.27 issue below WA we have already tried.
But we could not get issue resolved with above WA. Could you please recommend any WA for the issue in the meanwhile,. I would provide above gdb command output in sometime.
Thanks, | |
| Comment by Edwin Zhou [ 14/Feb/22 ] | |
|
Hi kg3634@gmail.com, Thank you for providing initial diagnostic data across the cluster. I suspect this might be a reoccurrence of In order to help confirm the behavior that's occurring, can you provide gdb on the node that is unresponsive the next time this behavior occurs?
Best, | |
| Comment by KAPIL GUPTA [ 12/Feb/22 ] | |
|
Hi Edward, I have uploaded required logs on support uploader, Please find the details given below: File Name: MongoQueryHighReponseLogs.tar.gz HighQueryReponse Issue Time: 2022-02-11T22:51 UTC Directory structure of logs file is given below: QueryHighReponseLogs/: QueryHighReponseLogs/Arbiter1: QueryHighReponseLogs/Arbiter2: QueryHighReponseLogs/Arbiter3: QueryHighReponseLogs/Primary: QueryHighReponseLogs/Primary/diagnostic.data: QueryHighReponseLogs/Secondary1withHighReponse: QueryHighReponseLogs/Secondary1withHighReponse/diagnostic.data: QueryHighReponseLogs/Secondary2: QueryHighReponseLogs/Secondary2/diagnostic.data: QueryHighReponseLogs/Secondary3: QueryHighReponseLogs/Secondary3/diagnostic.data: Note: For Arbiter only mongod logs are present (diagnostics are false as it is only voting members ,does not hold actual data) Please let me know any other info is required for analysis. Thanks, Kapil | |
| Comment by Edwin Zhou [ 11/Feb/22 ] | |
|
Hi kg3634@gmail.com, Thank you for your report. Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) for each node in the replica set and upload them to this support uploader location? Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Best, | |
| Comment by KAPIL GUPTA [ 08/Feb/22 ] | |
|
Additional Missing Information : Testing Environment:
Messages: 2022-02-02T02:55:54.392+0000 I COMMAND [conn554] command drasessions_1.drasessions command: find { find: “drasessions”, filter: { _id: { sessionid: “ClpGx3:172.16.241.40:15124:1643368779:0080300316” }}, limit: 1, singleBatch: true, $db: “drasessions_1”, $clusterTime: { clusterTime: Timestamp(1643770525, 464), signature: { hash: BinData(0, A9E0739EB1E3BBA9EF776A9FCEC9342E9457D221), keyId: 7042384422720503811 }}, lsid: { id: UUID(“8b321501-be08-4fa8-ada5-367cc1eb555e”) }, $readPreference: { mode: “nearest” } } planSummary: IDHACK keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:239 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 28911648 } }, MMAPV1Journal: { acquireCount: { r: 1 }}, Database: { acquireCount: { r: 1 }}, Collection: { acquireCount: { R: 1 }} } protocol:op_msg 28911ms
Troubleshooting performed so far:
host insert query update delete getmore command flushes mapped vsize res faults qrw arw net_in net_out conn set repl time Queries:
Kindly reply to above queries it would really help us to process further as this is becoming a big blocker for us to use mongo in our environment. Attachments:
|