Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58358

Datasize command does not validate the syntax and can cause major performance degradation

    • Fully Compatible
    • ALL
    • Hide

      Create some sample document with just _id

      MongoDB Enterprise > use database
      
      MongoDB Enterprise > db.collections2.insert({_id: NumberLong(1)})
      MongoDB Enterprise > db.collections2.insert({_id: NumberLong(2)})
      MongoDB Enterprise > db.collections2.insert({_id: NumberLong(3)})
      MongoDB Enterprise > db.collections2.insert({_id: NumberLong(4)})
      
      
      MongoDB Enterprise >  db.collections2.find()
      { "_id" : NumberLong(1) }
      { "_id" : NumberLong(2) }
      { "_id" : NumberLong(3) }
      { "_id" : NumberLong(4) }
      

      This is the command as it was run

      MongoDB Enterprise > db.runCommand({'dataSize': 'database.collections2', min: NumberLong("1"), max: NumberLong("2"), estimate: false})
      {
      	"estimate" : false,
      	"size" : 72,
      	"numObjects" : 4,       << Note that it has analyzed 4 documents (all) of them, showing how the min and max were not used
      	"millis" : 0,
      	"ok" : 1
      }
      

      This is instead the correct syntax to use

      MongoDB Enterprise > db.runCommand({'dataSize': 'database.collections2', keyPattern: { _id: 1 }, min: {_id: NumberLong("1")}, max: {_id: NumberLong("2")}, estimate: false})
      {
      	"estimate" : false,
      	"size" : 18,
      	"numObjects" : 1,      << Here instead I can 1 document was analyzed which is the expected result
      	"millis" : 0,
      	"ok" : 1
      }
      
      

      I believe however that, the command should have prompted an error communicating the malformed syntax.

      Show
      Create some sample document with just _id MongoDB Enterprise > use database MongoDB Enterprise > db.collections2.insert({_id: NumberLong(1)}) MongoDB Enterprise > db.collections2.insert({_id: NumberLong(2)}) MongoDB Enterprise > db.collections2.insert({_id: NumberLong(3)}) MongoDB Enterprise > db.collections2.insert({_id: NumberLong(4)}) MongoDB Enterprise > db.collections2.find() { "_id" : NumberLong(1) } { "_id" : NumberLong(2) } { "_id" : NumberLong(3) } { "_id" : NumberLong(4) } This is the command as it was run MongoDB Enterprise > db.runCommand({ 'dataSize' : 'database.collections2' , min: NumberLong( "1" ), max: NumberLong( "2" ), estimate: false }) { "estimate" : false , "size" : 72, "numObjects" : 4, << Note that it has analyzed 4 documents (all) of them, showing how the min and max were not used "millis" : 0, "ok" : 1 } This is instead the correct syntax to use MongoDB Enterprise > db.runCommand({ 'dataSize' : 'database.collections2' , keyPattern: { _id: 1 }, min: {_id: NumberLong( "1" )}, max: {_id: NumberLong( "2" )}, estimate: false }) { "estimate" : false , "size" : 18, "numObjects" : 1, << Here instead I can 1 document was analyzed which is the expected result "millis" : 0, "ok" : 1 } I believe however that, the command should have prompted an error communicating the malformed syntax.
    • 177
    • 2

      If the command does not specify any of the options below:

         keyPattern: <document>,
         min: <document>,
         max: <document>
      

      It is assumed that the user is aware that the command is going to scan the entire collection, which is ok.

      If the command is executed incorrectly without the "keyPattern" field but just using min/max:

         min: <document>,
         max: <document>,
      

      the command is still executed on the entire collection, so there is no validation of what the user was really intending to do, which is to run datasize on a particular indexed field.

      This can cause outages in large deployment, especially because datasize command cannot be killed (SERVER-58356)

      I believe the command should:
      1. Validate the options used and fail if incorrectly provided
      2. I am also questioning if it is safe to use estimate: false by default.

      Steps to reproduce below:

            Assignee:
            davis.haupt@mongodb.com Davis Haupt (Inactive)
            Reporter:
            ivan.grigolon@mongodb.com Ivan Grigolon
            Votes:
            3 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: