[SERVER-62909] The more databases exist, the longer listDatabases takes Created: 24/Jan/22  Updated: 05/Apr/22  Resolved: 29/Mar/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: John Moser Assignee: Edwin Zhou
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-62910 The more databases exist, the longer ... Closed
Operating System: ALL
Participants:

 Description   

MongoDB 4.4.10

MongoDB Operator

SSD over NFS (should not be an issue)

3 nodes @ 8 cores running on GKE

  • 1 db -> apprx 300ms
  • 10'000 dbs -> approx 7sec
  • 20'000 dbs -> approx 20sec

Not sure if it's linear or exponential

 



 Comments   
Comment by John Moser [ 05/Apr/22 ]

Please ignore the issue - it's definitely not a MongoDB core issue.

Comment by Edwin Zhou [ 29/Mar/22 ]

Hi jamoser42@gmail.com,

We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Best,
Edwin

Comment by Edwin Zhou [ 10/Mar/22 ]

Hi jamoser42@gmail.com,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please attach the $dbpath/diagnostic.data directory covering the slow database creation as requested by my colleague?

Best,
Edwin

Comment by John Moser [ 22/Feb/22 ]

@dmitry.agranat - pls hold on and let me retry again.

Comment by Dmitry Agranat [ 17/Feb/22 ]

jamoser42@gmail.com, w/o the requested information, I am not sure how we can help. For general questions w/o data, we encourage you to start by asking our community for help by posting on the MongoDB Developer Community Forums.

If you can provide the requested data, we will be happy to take a look and try to help.

From a single line provided in the attachment which includes the listDatabases, it looks like you might benefit from using the nameOnly option as mentioned in our documentation:

Optional. A flag to indicate whether the command should return just the database names, or return both database names and size information.
Returning size information requires locking each database one at a time, while returning only names does not require locking any database.
The default value is false, so listDatabases returns the name and size information of each database.

Comment by John Moser [ 15/Feb/22 ]

>> This would have also helped us to better understand why you need to create, list, and destroy (as well as how frequent) tens of thousands of databases.

Again its about creating databases - you can run the above snipped in a for loop with count=100'000. It's reproducible any time that creation of the first collection of the newly created database is taking more and more time. The reason is that listdatabase takes more and more time to execute (seems dependent of the number of collection) The log file I've uploaded shows, that during the creation of the first collection, listdatabase is called.

 

=> my request: is it possible to remove that call ? Or has listdatabase been optimized, so it's execution time is not dependent of the number of total number of collections

Comment by John Moser [ 14/Feb/22 ]

@dmitry.agranat@mongodb.com I think you misunderstand the issue here - in the code snippet there is no call to "listdatabases" and still the log (as you can see from the logs I've uploaded) you can see it.

So is there a implied "listdatabases" when creating a database (I am not talking about the mongodb cluster) ? And if yes can this be avoided in the mongodb code ?

Besides the question why so many databases are created - if you are familiar with multitenancy then it makes totally sense to do it like this.

Comment by Dmitry Agranat [ 14/Feb/22 ]

jamoser42@gmail.com, thanks for providing the code snippet but what we wanted to see is the actual (and historical) impact on your system, under your workload via the diagnostics.data. This would have also helped us to better understand why you need to create, list, and destroy (as well as how frequent) tens of thousands of databases.

Comment by John Moser [ 08/Feb/22 ]

Sorry - I am not able to copy diagnostic.data/* to local disk.

As a side note: The replicaset runs on a 3x8 core (N1)

With 20k -> listdatabase takes approx 2.5s
With 100k -> listdatabase takes > 20s

Here the code snippet which generated the log:

MongoDatabase db = mongoClient.getDatabase(tenantId);
BasicDBObject command = new BasicDBObject("createUser", tenantId).append("pwd", myPwd).append("roles",
Collections.singletonList(new BasicDBObject("role", "dbOwner").append("db", tenantId)));
db.runCommand(command);

 

MongoDatabase database = mongoClient.getDatabase(tenantId); MongoCollection<Document> collection = database.getCollection("profile");
collection.insertOne(aProfile);

MongoDatabase database = mongoClient.getDatabase(tenantId); MongoCollection<Document> collection = database.getCollection("document");
collection.insertOne(aDocument);

MongoDatabase database = mongoClient.getDatabase(tenantId); MongoCollection<Document> collection = database.getCollection("folders");
collection.insertOne(aFolder);

 

Comment by John Moser [ 08/Feb/22 ]

@dmitry.agranat@mongodb.com pls hold on, I have to prepare the test setup.

Log file uploaded

Comment by Dmitry Agranat [ 08/Feb/22 ]

jamoser42@gmail.com, do you have an update for my latest request?

Comment by John Moser [ 01/Feb/22 ]

Pls hold on.

Comment by Dmitry Agranat [ 26/Jan/22 ]

jamoser42@gmail.com, In order to better understand the issue and the observed behavior, we'd like to review the logs and diagnostics data.

I've created a secure upload portal for you. Files uploaded to this portal are hosted on Box, are visible only to MongoDB employees, and are routinely deleted after some time.

For each node in the replica set spanning a time period that includes the incident, would you please archive (tar or zip) and upload to that link:

  • the mongod logs
  • the $dbpath/diagnostic.data directory (the contents are described here)
Comment by John Moser [ 25/Jan/22 ]

It looks like if a new database is created then if a collection is added, that during this action listDatabases() is called. So it's not me calling the method but the MongoDB code.

But anyways since there must be calls to listDatabases() everywhere in the MongoDB code, it would be great if this fix could also be included to the MongoDB 4.4 version.

We have a multi tenancy setup, where each tenants has it's own db. We are approaching 40k db and creation of a new tenant (=new db) takes now forever. Sometimes even timing out. Everything else works pretty fast.

Comment by Dmitry Agranat [ 25/Jan/22 ]

jamoser42@gmail.com, currently, the listDatabases command takes Database or Collection level MODE_IS locks, the more databases/collections you have, the more time you'll need to wait for locks. We have fixed this in SERVER-57357.

How often do you need to run this command and what output are you looking for?

Comment by John Moser [ 24/Jan/22 ]

As a side note - when creating the first collection of newly created database, it looks like that listDatabases is also called. This makes creation of the (first) collection quite a heavy operation (uses a lot of CPU as well as it takes very long) even if the effective creation of the collection takes a couple of xx miliiseconds.

Generated at Thu Feb 08 05:56:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.