We use the WT stats to calculate storageSize, but the format changes after it reaches a file size threshold. We need to account for and handle the format change or change it upstream.
Calculation is made here:
https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp#L226
from:
> db.f.stats().wiredtiger["block manager"]
{
"file allocation unit size" : "4096",
"blocks allocated" : "0",
"checkpoint size" : "0",
"allocations requiring file extension" : "0",
"blocks freed" : "0",
"file magic number" : "120897",
"file major version number" : "1",
"minor version number" : "0",
"file bytes available for reuse" : "0",
==> "file size in bytes" : "4096"
}
To this:
> db.f.stats().wiredtiger["block manager"]
{
"file allocation unit size" : "4096",
"blocks allocated" : "97",
"checkpoint size" : "2M (2322432)",
"allocations requiring file extension" : "97",
"blocks freed" : "0",
"file magic number" : "120897",
"file major version number" : "1",
"minor version number" : "0",
"file bytes available for reuse" : "0",
==> "file size in bytes" : "2M (2330624)"
}
As a side note, the human readable stats are also wrong in WT once file size goes to GB. For example:
> db.bulk3.stats().wiredtiger["block manager"]
{
"file allocation unit size" : "4096",
"blocks allocated" : "0",
"checkpoint size" : "4B (4217180160)",
"allocations requiring file extension" : "0",
"blocks freed" : "0",
"file magic number" : "120897",
"file major version number" : "1",
"minor version number" : "0",
"file bytes available for reuse" : "20480",
==> "file size in bytes" : "4B (4217196544)"
}