[SERVER-5714] Nicer behavior of dbstats call on database with large nssize Created: 26/Apr/12  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: MMAPv1, Performance, Storage
Affects Version/s: 2.0.4, 2.1.0
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Jason R. Coombs Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-5180 Option to dbstats and collstats to sk... Closed
Assigned Teams:
Storage Execution
Participants:

 Description   

If one has a database with thousands of collections (each with a few indexes), it has a very large nssize. The docs indicate that this might be a problem when they say

Command takes some time to run, typically a few seconds unless the .ns file is very large (via use of --nssize). While running other operations may be blocked.

It's not obvious from this statement, but if you run dbstats on a database with a very large nssize, it can literally take you database offline for minutes our hours as it did in our environment:

{
"opid" : 1711160082,
"active" : true,
"lockType" : "read",
"waitingForLock" : false,
"secs_running" : 882,
"op" : "query",
"ns" : "gryphon",
"query" :
Unknown macro:

Unknown macro: { "dbstats" }

,
"client" : "10.1.45.2:54395",
"desc" : "conn",
"threadId" : "0x7df7b4a04710",
"connectionId" : 3450907,
"numYields" : 0
},

One of our devs added this query to part of our database browser not realizing the impact it would have on this particular database. When he browsed to our production database, it took services down for 15 minutes (882 seconds at the time we read that log entry).

We ran into this problem when we launched MMS against our servers. We've been banned from using MMS as a result.

There should be a way to avoid these situations at the database level (rather than by patching MMS, patching our client apps, and patching each and every developer to remember not to invoke this operation).

A few options:

  • Add a configuration parameter or flag to disable dbstats on a particular database or an entire MongoDB instance.
  • Rewrite dbstats to not hold the lock for so long (and to fail if it takes more than a configurable amount of time).
  • Rewrite dbstats to require a parameter to allow it to run on a database with a nssize of a certain size or if it runs for too long (something like dangerous=True).

If one could protect our database from this, it would mean we can prevent this situation from other inadvertent or intentional DoS in the future, regardless of where the request comes from.



 Comments   
Comment by Jason R. Coombs [ 23/Oct/17 ]

I expect this issue only applies to MMAPv1, as WiredTiger has more granular locking.

I created this test:

"""
First, install Python 3.6 and rwt.
Next, set a high ulimit (i.e. ulimit -n 24000).
Then, run thus::
 
    rwt -- -m pytest test-SERVER-5714.py
"""
 
__requires__ = ['jaraco.mongodb', 'pytest', 'tempora']
 
 
import datetime
 
from tempora.timing import Stopwatch
 
 
def test_dbstats_on_ten_thousand_collections(request, mongodb_instance):
	conn = mongodb_instance.get_connection()
	db = conn[request.node.name]
	for n in range(10000):
		coll_name = f'coll_{n}'
		db[coll_name].insert({})
	with Stopwatch() as watch:
		db.command('dbstats')
	assert watch.elapsed < datetime.timedelta(seconds=1)

This test passes on MongoDB 3.4.9 using default parameters (WiredTiger for storage), meaning db.stats completes in under 1 second with 10000 collections in existence.

Comment by Eric Milkie [ 23/Oct/17 ]

This ticket only involves MMAPv1, as it talks about mmap-specific structures and behaviors. There could be a similar issue for WiredTiger regarding the "dbStats" command and databases with large numbers of collections, but I don't believe a JIRA ticket exists.

Comment by Jeff Widman [ 23/Oct/17 ]

Does this affect WiredTiger or only MMAPv1?

Generated at Thu Feb 08 03:09:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.