[SERVER-20716] Slow listDatabases with WT Created: 01/Oct/15 Updated: 12/Nov/15 Resolved: 12/Nov/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Anthony Brodard | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 3 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Operating System: | Linux | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
On instance with more than 2k databases, a listDatabase take more than 11s :
Tested on several instances :
On another instance, running in 2.6 with mmapv1, listing about 8x more databases take less than 1s :
Seems related to Mongo version : 3.0.6 |
| Comments |
| Comment by Ian Whalen (Inactive) [ 12/Nov/15 ] | |||
|
Hi Anthony, once again thanks a lot for the bug report. Because it looks like this is a particular symptom that we're going to be addressing through a few different fixes ( We try to only keep issues open when we know that code will be committed and attached to it, and in this case the code that's relevant for you will show up on these other 3 issues; for updates as we continue to work on these please add yourself as a watcher there. | |||
| Comment by Anthony Brodard [ 27/Oct/15 ] | |||
|
Hi David, I think we can wait until the mentioned tickets are closed. Thank you for your help, | |||
| Comment by David Hows [ 26/Oct/15 ] | |||
|
Hi Anthony, One last thing, you may be able to stop the MMS agent from running the listDatabases command by disabling dbstats collection. You can find instructions on how to do this at: https://docs.cloud.mongodb.com/faq/#how-does-mms-gather-database-statistics Regards, | |||
| Comment by David Hows [ 22/Oct/15 ] | |||
|
Thanks Anthony, Thats great background to this issue. Thus far from investigating this issue we have raised 3 tickets, which suggest changes that could be made here to hopefully mitigate the problems you are seeing:
For now, I think there is little more we can do on this ticket as the work will be on the tickets listed above. Let me know if there is more I can do to help here or if you are okay with my marking this ticket as being completed by the above tickets. Regards, | |||
| Comment by Anthony Brodard [ 22/Oct/15 ] | |||
|
Hi David, Thank you for those useful explanation. This problem occurs principally when :
We list the databases manually rarely, and, even if it can be useful in some cases, the database size is not a essential information. Regards, | |||
| Comment by David Hows [ 22/Oct/15 ] | |||
|
Hi Anthony, Thanks for the extra logs, that has given me a much larger grasp on the situation. I will speak in broad numbers here, but it appears that over the whole of your system you have only 4k databases. Within those databases there are 70k collections. And combined you have a total of 230k collections and indexes. When you run the listDatabases command we do 2 things, first generate a list of all databases, and then we go and gather the size of each of those databases. Under MMAPv1 this is pretty straightforward as all the data for all the collections and indexes in a given databases are stored in that databases files, so we can just check on the size of those files. Under WiredTiger we store each collection and each index in its own individual file. When we go to find the size of the database, we need to go out and gather the sizes from each of the 230K collections and indexes to report on the sizes. From the verbose outputs it shows that we do about 6 or 7 of these entities per microsecond on your system, which is where the time is spent. I will be raising some tickets looking at ways we can improve how we go about this, but have a question for you. Do you use those database sizes when listing the databases? Or is this simply looking for all the names to iterate? Thanks! | |||
| Comment by Anthony Brodard [ 21/Oct/15 ] | |||
|
Thanks Ramon, @David, I've uploaded a huge verbose log file. Regards, | |||
| Comment by Ramon Fernandez Marina [ 19/Oct/15 ] | |||
|
anthony@brodard.me, I've created an upload portal David requested. Any data you upload will only be visible to MongoDB engineers. Thanks, | |||
| Comment by Anthony Brodard [ 19/Oct/15 ] | |||
|
Hi David, I confirm we don't create or drop collection during normal operation. Regards, | |||
| Comment by David Hows [ 18/Oct/15 ] | |||
|
Hi Anthony, Thanks for taking the time to gather that data for me and running those extra tests I had hoped that your workload would mirror something akin to the workload seen in Would you be able to grab the verbose logs as you suggested? I'd like to review the operations you are running. If you ned a more private way to upload files so that they aren't attached to this ticket please let me know. Thanks, | |||
| Comment by Anthony Brodard [ 16/Oct/15 ] | |||
|
Hi David, Sorry about the late reply. Additionally, can you confirm how you are running the listDatabase command in these cases? For information, we run 10 mongod instances on the same host (in docker containers). Sometimes, listDatabases command were stuck in one or many instances, locking all operations, and causing high load on the physical server. This problem seems to have been fixed since we have upgraded in 3.0.6, but slow listDatabases is still present. If needed, I can activate verbose logs on the instance. Regards, | |||
| Comment by Anthony Brodard [ 16/Oct/15 ] | |||
|
Add mongod logs | |||
| Comment by David Hows [ 15/Oct/15 ] | |||
|
Hi Anthony, I've been looking over this again, with context from another engineer. One thing that has sprung up is that your system has 73K open data handles, which is a not insignificant amount, especially since your system was performing few operations at the time. Would you be able to upload the log requested as some of the historical information may be able to inform why this value is where it is. Additionally, would you be able to restart the instance and see if that has a positive impact? Thanks, | |||
| Comment by David Hows [ 11/Oct/15 ] | |||
|
Hi Anthony, Sorry for the delay in getting back to you. I've looked through the files you attached and have not yet been able to find any correlation. Would you be able to grab the logfile as well? I added that to the initial request in a second edit so it may not have been emailed to you. Additionally, can you confirm how you are running the listDatabase command in these cases? Thanks, | |||
| Comment by Anthony Brodard [ 02/Oct/15 ] | |||
|
Hi David, I've added required files to the ticket. Regards, | |||
| Comment by David Hows [ 02/Oct/15 ] | |||
|
Hi Anthony, I've tried to reproduce this under 3.0.6 and listed off 5K databases in under 200ms, as below.
Would you be able to gather the run the following commands for me, then run the reproduction again from your system to see if we can shed further light on this issue:
Can you also attach the logfile from the instance which is showing the slowness. Thanks, |