[SERVER-27666] listDatabases performance issue w/ replicasets & wiredTiger Created: 12/Jan/17  Updated: 18/Jan/17  Resolved: 18/Jan/17

Status: Closed
Project: Core Server
Component/s: Replication, WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Chad Kreimendahl Assignee: Mark Agarunov
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-3181 Add option to listDatabases to only g... Closed
Operating System: ALL
Steps To Reproduce:
  1. Create 100s of databases with 100s of collections and dozens of indexes on each collection, totaling over 250,000 files (if on SSD. spinning drives may require less)
  2. Enable a replicaset
  3. (or just run show dbs)
Participants:

 Description   

Each of the non-primary replica members appears to be asking the primary every minute for a "listDatabases". Since each requests takes about 2 seconds, every minute or so, we'll see our members lag on sync, because of this.

While some performance issues were fixed in SERVER-17078, the overall problem of time to return has not. Obviously with more than 250,000 files in our wiredTiger directory, one might expect it would take a bit of time to count the sizes.

I would suggest creating a listDatabases (like) method that did not get file and index sizes. This would be used for whatever purposes the replicasets require, without creating the massive requests to the filesystem and delays in synchronization



 Comments   
Comment by Mark Agarunov [ 18/Jan/17 ]

Hello sallgeud,

Since this ticket appears to have the same root cause as SERVER-3181, I've marked this ticket as a duplicate. I've added a request to backport the fix to MongoDB version 3.2 and 3.4, however the final decision on whether or not the fix will be backported will only be made after the issue has been fixed in the current master branch.

Thanks,
Mark

Comment by Chad Kreimendahl [ 12/Jan/17 ]

Great. Seems like he got a fairly simple fix in about half a year ago and it's been ignored since.

Right now, the command triggers our slow log and is responsible for over 95% of our logs, and 80% of all replica delay.

We'll need it on 3.2, as we're several months from going 3.4.

Comment by Ramon Fernandez Marina [ 12/Jan/17 ]

sallgeud, I believe the behavior you describe may have the same root cause as SERVER-3181. I'm moving that ticket to "Needs Triage" so it gets revisited by the appropriate team.

Generated at Thu Feb 08 04:15:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.