[SERVER-30710] Several seconds for indexed query - global lock? Created: 17/Aug/17 Updated: 09/Oct/17 Resolved: 07/Sep/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Vlad Zloteanu | Assignee: | Dmitry Agranat |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
I find, in the slow query logs, queries that make no sense. An excerpt is below:
20 seconds for a query that hits an index? Sometimes I also get queries on an ID:
It occurs suddenly, every several minutes. It affects 20-30 queries, than nothing. It's like, from time to time, there is a global lock that occurs. More details about my setup:
Mongostat does not give anything suspicious:
One primary, one replica, one voter. For multitenancy, I am using one DB per client (600 databases). They take 185 GB on the disk. |
| Comments |
| Comment by Dmitry Agranat [ 07/Sep/17 ] | |||
|
Hi Vlad, Thank you for providing the requested information. I have reviewed the uploaded data. During the reported time frame, the operation rate is between ~18K and 24K and for 99.9% of the time, the operation response time is ~240ms. There are 8 occurrences where we can see a few queries are taking ~20sec. Unfortunately, the data you've uploaded did not suffice the cause of these occurrences. This does not mean that there is no problem, it just means that currently, in 3.0.15 the diagnostic data does not expose the potential cause. As mentioned earlier by Thomas, I recommend upgrading the cluster to one of our latest versions, preferably 3.4.7. Apart from many significant improvements since 3.0.x versions, we have also significantly improved our diagnostics capabilities, which would help us to better understand the cause of these occurrences. If after upgrading your cluster you still see this issue reoccurs, please open a new case and attach an archive of the diagnostic.data directory in your $dbpath and the complete mongod log files. Thank you, | |||
| Comment by Vlad Zloteanu [ 29/Aug/17 ] | |||
|
Hello @Thomas Schubert, Did you get the chance to look into the logs I uploaded? Thanks, | |||
| Comment by Vlad Zloteanu [ 22/Aug/17 ] | |||
|
Thank you for your reply, Thomas.
| |||
| Comment by Kelsey Schubert [ 22/Aug/17 ] | |||
|
Hi vladzloteanu, Sorry, I missed that you were running MongoDB 3.0, which does not capture diagnostic.data. As Ramón mentioned in Would you please run the following script for an hour?
Afterwards, please provide both generated files, as well as the mongod.log files covering the time period. Thank you, | |||
| Comment by Vlad Zloteanu [ 17/Aug/17 ] | |||
|
I have uploaded the logs. Here is an example of a line:
Here's another one:
I can also give you access to the datadog monitoring account. Please let me know if you need anything else. | |||
| Comment by Vlad Zloteanu [ 17/Aug/17 ] | |||
|
Hello Thomas, | |||
| Comment by Kelsey Schubert [ 17/Aug/17 ] | |||
|
Hi vladzloteanu, Thanks for the report. So we can investigate, would you please provide an archive of the diagnostic.data directory in your $dbpath and the complete mongod log files for the affected node? I've created a secure upload portal for you to use. Files uploaded to this portal are only visible to MongoDB employees investigating this issue and are routinely deleted after some time. Thank you, | |||
| Comment by Vlad Zloteanu [ 17/Aug/17 ] | |||
|
Also, as there are 600+ databases, the mongodb process has around 62k open FDs. | |||
| Comment by Vlad Zloteanu [ 17/Aug/17 ] | |||
|
I am available for any other information necessary. They are the same servers as on |