[SERVER-27689] WiredTiger disk usage stats output does not seem correct Created: 14/Jan/17  Updated: 24/Jan/17  Resolved: 24/Jan/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dharshan Rangegowda Assignee: David Hows
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File stats.txt    
Issue Links:
Related
is related to DOCS-9814 WiredTiger does not compact or reclai... Closed
Operating System: ALL
Participants:

 Description   

I have 3.2.3 server running wiredtiger. The stats output does not seem reasonable.

I would expect Storage size ~= datasize + indexsize

But in this case it seems to be way off.

db.stats();

{ "db" : "xxx", "collections" : 9, "objects" : 202860, "avgObjSize" : 13789.263595583161, "dataSize" : 2797290013, "storageSize" : 10694090752, "numExtents" : 0, "indexes" : 17, "indexSize" : 18022400, "ok" : 1 }

 Comments   
Comment by David Hows [ 24/Jan/17 ]

Hi Dharshan,

Glad to hear this worked.

The repairDatabase command doesn't reclaim disk space with WiredTiger, but does with the MMAP storage engine.

I'l be raising a documentation ticket to have this addressed.

Thanks,
David

Comment by Dharshan Rangegowda [ 23/Jan/17 ]

Hi David,

The compact command on the collection reclaimed the disk space. However the --repair option on the whole instance did not reclaim the disk space. Is this a bug? It used to reclaim space in previous versions of mongo.

Comment by David Hows [ 23/Jan/17 ]

Hi Dharshan,

Do you have a follow up here?

Did the compact reclaim the disk space as expected?

Thanks,
David

Comment by Dharshan Rangegowda [ 18/Jan/17 ]

I ran an instance wide mongod --repair ... and that doesn't seem to have done anything. Shouldn't that do a compact as well?

I will trigger the compact per collection and report results.

Comment by David Hows [ 17/Jan/17 ]

Hi Dharshan,

Looking through those stats I can see that there are two collections which have a very high "file bytes available for reuse" value - both around 4GB.

In this case, it looks like either some form of fragmentation, or the collection has recently been shrunk considerably and the space has not yet been reclaimed. In this case it was the db.xxx3 and db.xxx8 collections - has the amount of data in those collections shrunk significantly from it's peak?

Can you arrange to have compact run on these collections and confirm if this resolves the size issues?

Comment by Dharshan Rangegowda [ 16/Jan/17 ]

Also note that snappy compression is enabled.

Comment by Dharshan Rangegowda [ 16/Jan/17 ]

Hi David,

The stats output for each of the 9 collections in the DB is attached to the ticket. The names have xxx'ed out. Let me know if you need anything else from my end.

Comment by David Hows [ 16/Jan/17 ]

Hi Dharshan,

Your algorithm is slightly wrong under WiredTiger with indexSize and storageSize representing total bytes stored on disk and dataSize representing the size of all the documents in the collection. Under normal circumstances we would expect the storageSize to be within 0.5x-2x of the dataSize, depending on compression ratios, checkpoint activity, etc.

So, with this in mind the issue here is that the 10GB of reported storageSize is many times larger than the 2GB of data stored.

It's possible this is a known bug, but without more data I cannot say for certain. The next piece of data I would be looking for are the db.collection.stats() outputs for each of the 9 collections in this database to confirm if this issue is isolated to one collection or to many.

From there, unless this is a new bug, the likely remedial steps would be to upgrade to the latest 3.2 point release (3.2.11) and then run compact on the collection.

Generated at Thu Feb 08 04:15:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.