[SERVER-58508] The "dataSize" command does not calculate/include the records matching specified bounds Created: 14/Jul/21  Updated: 21/Jul/21  Resolved: 21/Jul/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Petr Novak Assignee: Edwin Zhou
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I tried to use https://docs.mongodb.com/manual/reference/command/dataSize/#mongodb-dbcommand-dbcmd.dataSize  command to obtain exact size of database records mathing the specified key-range, like:

 

db.runCommand({ 
   dataSize: "mydb.mycollection", 
   keyPattern: { "_id": 1 }, 
   min: { "_id": new ObjectId("606ded0a249ef5c340ea248e") }, 
   max: { "_id": new ObjectId("606ded0a249ef5c340ea248e") }
})



 Comments   
Comment by Edwin Zhou [ 21/Jul/21 ]

Hi kek.forums@gmail.com,

Thanks for your report. Please note that the SERVER project is for bugs and feature suggestions for the MongoDB server. As this ticket does not appear to be a bug, I will now close it. If you need further assistance troubleshooting, I encourage you to ask our community by posting on the MongoDB Developer Community Forums. If your investigation with our Community Forums leads you to believe you've hit a bug, let us know and we will be happy to reopen this ticket and further investigate!

Best,
Edwin

Comment by Petr Novak [ 14/Jul/21 ]

I apologize, I mistakenly submitted an unfinished description of the issue and there is no possibility to edit the description, so I will add the rest to the comment.

 

The MongoDB buid info:

{
    "version" : "4.4.1",
    "gitVersion" : "ad91a93a5a31e175f5cbf8c69561e788bbc55ce1",
    "targetMinOS" : "Windows 7/Windows Server 2008 R2",
    "modules" : [],
    "allocator" : "tcmalloc",
    "javascriptEngine" : "mozjs",
    "sysInfo" : "deprecated",
    "versionArray" : [ 
        4, 
        4, 
        1, 
        0
    ],
    "openssl" : {
        "running" : "Windows SChannel"
    },
    "buildEnvironment" : {
        "distmod" : "windows",
        "distarch" : "x86_64",
        "cc" : "cl: Microsoft (R) C/C++ Optimizing Compiler Version 19.26.28806 for x64",
        "ccflags" : "/nologo /EHsc /W3 /wd4068 /wd4244 /wd4267 /wd4290 /wd4351 /wd4355 /wd4373 /wd4800 /wd5041 /wd4291 /we4013 /we4099 /we4930 /WX /errorReport:none /MD /O2 /Oy- /bigobj /utf-8 /permissive- /Zc:__cplusplus /Zc:sizedDealloc /volatile:iso /diagnostics:caret /std:c++17 /Gw /Gy /Zc:inline",
        "cxx" : "cl: Microsoft (R) C/C++ Optimizing Compiler Version 19.26.28806 for x64",
        "cxxflags" : "/TP",
        "linkflags" : "/nologo /DEBUG /INCREMENTAL:NO /LARGEADDRESSAWARE /OPT:REF",
        "target_arch" : "x86_64",
        "target_os" : "windows",
        "cppdefines" : "SAFEINT_USE_INTRINSICS 0 PCRE_STATIC NDEBUG BOOST_ALL_NO_LIB _UNICODE UNICODE _SILENCE_CXX17_ALLOCATOR_VOID_DEPRECATION_WARNING _SILENCE_CXX17_OLD_ALLOCATOR_MEMBERS_DEPRECATION_WARNING _SILENCE_CXX17_CODECVT_HEADER_DEPRECATION_WARNING _CONSOLE _CRT_SECURE_NO_WARNINGS _SCL_SECURE_NO_WARNINGS _WIN32_WINNT 0x0A00 BOOST_USE_WINAPI_VERSION 0x0A00 NTDDI_VERSION 0x0A000000 BOOST_THREAD_VERSION 5 BOOST_THREAD_USES_DATETIME BOOST_SYSTEM_NO_DEPRECATED BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS BOOST_ENABLE_ASSERT_DEBUG_HANDLER BOOST_LOG_NO_SHORTHAND_NAMES BOOST_LOG_USE_NATIVE_SYSLOG BOOST_LOG_WITHOUT_THREAD_ATTR ABSL_FORCE_ALIGNED_ACCESS"
    },
    "bits" : 64,
    "debug" : false,
    "maxBsonObjectSize" : 16777216,
    "storageEngines" : [ 
        "biggie", 
        "devnull", 
        "ephemeralForTest", 
        "wiredTiger"
    ],
    "ok" : 1.0
}

 

So, from my point of view, the  command example above, should return size of the one record, matching the _id, but the result is:

{
    "estimate" : false,
    "size" : 0,
    "numObjects" : 0,
    "millis" : 0,
    "ok" : 1.0
}

I´m pretty surre that the record exists:

db.getCollection('mycollection').count({"_id": new ObjectId("606ded0a249ef5c340ea248e")})
 
Result:  1

So it seems that the range bounder values are excluded from the selection. 

Yes, this example is pretty useless, but it was used for simple demonstration of the bug.

My real usecase  for the command is following:

  • we used the mongodb as multitenant data storage, where the schema looks like

    {
      _id:ObjectId("606ded0a249ef5c340ea248e"),
      tenantId: 12345,
      data: ...
    }

  • we have index on  "tenantId" attribute
  • we need to calculate statistics about  total record size per tenantId, so I tried to execute command like

    db.runCommand({ 
      dataSize: "myDb.myCollection", 
      keyPattern: { "tenantId": 1 }, 
      min: { "tenantId": 12345 }, 
      max: { "tenantId": 12345 }
     })
    

  • the expected resul should be like

    { 
    "estimate" : false, 
    "size" : 135678901, 
    "numObjects" : 1231, 
    "millis" : 24567, 
    "ok" : 1.0 
    }

  • but the result was 

    { 
    "estimate" : false, 
    "size" : 0, 
    "numObjects" : 0, 
    "millis" : 0, 
    "ok" : 1.0 
    }

 

So there is no option, how to calculate the size of records matching exact one specific value of the specified KEY field. 

 

 

Generated at Thu Feb 08 05:44:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.