Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Won't Fix
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.0.4, 2.1.0
Component/s: MMAPv1, Performance, Storage
Labels:
None

Assigned Teams:

Storage Execution
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If one has a database with thousands of collections (each with a few indexes), it has a very large nssize. The docs indicate that this might be a problem when they say

Command takes some time to run, typically a few seconds unless the .ns file is very large (via use of --nssize). While running other operations may be blocked.

It's not obvious from this statement, but if you run dbstats on a database with a very large nssize, it can literally take you database offline for minutes our hours as it did in our environment:

{
"opid" : 1711160082,
"active" : true,
"lockType" : "read",
"waitingForLock" : false,
"secs_running" : 882,
"op" : "query",
"ns" : "gryphon",
"query" :
Unknown macro:

Unknown macro: { "dbstats" }

,
"client" : "10.1.45.2:54395",
"desc" : "conn",
"threadId" : "0x7df7b4a04710",
"connectionId" : 3450907,
"numYields" : 0
},

One of our devs added this query to part of our database browser not realizing the impact it would have on this particular database. When he browsed to our production database, it took services down for 15 minutes (882 seconds at the time we read that log entry).

We ran into this problem when we launched MMS against our servers. We've been banned from using MMS as a result.

There should be a way to avoid these situations at the database level (rather than by patching MMS, patching our client apps, and patching each and every developer to remember not to invoke this operation).

A few options:

Add a configuration parameter or flag to disable dbstats on a particular database or an entire MongoDB instance.
Rewrite dbstats to not hold the lock for so long (and to fail if it takes more than a configurable amount of time).
Rewrite dbstats to require a parameter to allow it to run on a database with a nssize of a certain size or if it runs for too long (something like dangerous=True).

If one could protect our database from this, it would mean we can prevent this situation from other inadvertent or intentional DoS in the future, regardless of where the request comes from.

related to

SERVER-5180 Option to dbstats and collstats to skip extent-level data

Closed

Assignee:: [DO NOT USE] Backlog - Storage Execution Team
Reporter:: Jason R. Coombs
Participants:: [DO NOT USE] Backlog - Storage Execution Team, Eric Milkie, Jason R. Coombs, Jeff Widman
Votes:: 3 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: Apr 26 2012 05:32:31 PM UTC
Updated:: Dec 06 2022 05:33:44 AM UTC
Resolved:: Sep 14 2018 08:15:20 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates