[SERVER-58358] Datasize command does not validate the syntax and can cause major performance degradation Created: 07/Jul/21  Updated: 29/Oct/23  Resolved: 07/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.12
Fix Version/s: 5.3.0

Type: Bug Priority: Major - P3
Reporter: Ivan Grigolon Assignee: Davis Haupt (Inactive)
Resolution: Fixed Votes: 3
Labels: neweng, sharding-nyc-subteam2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
Related
is related to SERVER-58356 Cannot kill the dataSize operation Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Create some sample document with just _id

MongoDB Enterprise > use database
 
MongoDB Enterprise > db.collections2.insert({_id: NumberLong(1)})
MongoDB Enterprise > db.collections2.insert({_id: NumberLong(2)})
MongoDB Enterprise > db.collections2.insert({_id: NumberLong(3)})
MongoDB Enterprise > db.collections2.insert({_id: NumberLong(4)})
 
 
MongoDB Enterprise >  db.collections2.find()
{ "_id" : NumberLong(1) }
{ "_id" : NumberLong(2) }
{ "_id" : NumberLong(3) }
{ "_id" : NumberLong(4) }

This is the command as it was run

MongoDB Enterprise > db.runCommand({'dataSize': 'database.collections2', min: NumberLong("1"), max: NumberLong("2"), estimate: false})
{
	"estimate" : false,
	"size" : 72,
	"numObjects" : 4,       << Note that it has analyzed 4 documents (all) of them, showing how the min and max were not used
	"millis" : 0,
	"ok" : 1
}

This is instead the correct syntax to use

MongoDB Enterprise > db.runCommand({'dataSize': 'database.collections2', keyPattern: { _id: 1 }, min: {_id: NumberLong("1")}, max: {_id: NumberLong("2")}, estimate: false})
{
	"estimate" : false,
	"size" : 18,
	"numObjects" : 1,      << Here instead I can 1 document was analyzed which is the expected result
	"millis" : 0,
	"ok" : 1
}

I believe however that, the command should have prompted an error communicating the malformed syntax.

Participants:
Case:
Linked BF Score: 177
Story Points: 2

 Description   

If the command does not specify any of the options below:

   keyPattern: <document>,
   min: <document>,
   max: <document>

It is assumed that the user is aware that the command is going to scan the entire collection, which is ok.

If the command is executed incorrectly without the "keyPattern" field but just using min/max:

   min: <document>,
   max: <document>,

the command is still executed on the entire collection, so there is no validation of what the user was really intending to do, which is to run datasize on a particular indexed field.

This can cause outages in large deployment, especially because datasize command cannot be killed (SERVER-58356)

I believe the command should:
1. Validate the options used and fail if incorrectly provided
2. I am also questioning if it is safe to use estimate: false by default.

Steps to reproduce below:



 Comments   
Comment by Githook User [ 06/Jan/22 ]

Author:

{'name': 'Davis Haupt', 'email': 'davis.haupt@mongodb.com', 'username': 'davish'}

Message: SERVER-58358 validate arguments passed to dataSize command
Branch: master
https://github.com/mongodb/mongo/commit/0a1a88755daa504c67e13dd49667a06c3c606ee8

Comment by Davis Haupt (Inactive) [ 21/Dec/21 ]

After some investigation, it seems like the command in the ticket is malformed not because it is missing a keyPatterm, but because the min and max keys are malformed. If the keyPattern is empty, then it is inferred based on the min key bound (https://github.com/10gen/mongo/blob/master/src/mongo/db/commands/dbcommands.cpp#L370). In the example, the min and max were NumberInts. However, dataSize is expecting an object that represents the lower and upper bounds of the key range based on some index, rather than a min and max value for some field.

Comment by Connie Chen [ 12/Nov/21 ]

Passing this to Sharding NYC to review as this is not catalog-related. 

Generated at Thu Feb 08 05:44:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.