[SERVER-9303] Collection Size Growing Very Quickly Despite Document Removal Created: 09/Apr/13  Updated: 11/Apr/13  Resolved: 11/Apr/13

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.2.3, 2.2.4
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Nimi Wariboko Jr. Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: bson, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

5x Mongo Shards w/

Ubuntu 10.04.4 LTS
64 - bit
830GB Drives
Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz
32GB Ram


Participants:

 Description   

Steps to Reproduce Issue :

1.) Have a fairly large collection, ~10 million records, avg doc size is 1.4 kb.
2.) Replace each document (all 10 million) frequently with a remove() -> insert() (This works faster than an update because each remove+insert can be batched)

After some time, according to stats the StorageSize gets huge, even though the totalSize is relatively small. In my case the StorageSize is 17 times larger than the totalSize.

A single shard would have a 5 Gb total size, but a 100 Gb Storage Size.

Here is the stats() for the collection in question

http://pastebin.com/qsbdJMQV

RS4 and RS5 are relatively empty because they are new shards.



 Comments   
Comment by Scott Hernandez (Inactive) [ 09/Apr/13 ]

Yes, dropping collections is a fast and more efficient way to reuse space (internally). Doing it at the db level is also fine and will release space to the filesystem in addition. Many people have a rotating system where they fill collections by time periods (or some of the criteria) and drop/create them periodically.

Dropping a collection is much faster than deleting documents as well. With a sharded collection it is best not to reuse the collection names if you aren't on 2.4.x (mostly because of inefficiencies we fixed in newer versions).

Comment by Nimi Wariboko Jr. [ 09/Apr/13 ]

So is the issue that the new documents are unable to fit in the old space? I had assumed since each document was a fixed size that fragmentation wouldn't be too much of an issue. Anyways I have turned it on, and I'll monitor the size of the new shard.

This probably isn't the place for this however, but could I simply drop the collection to reclaim the used space, or is a repair my only option?

Comment by Scott Hernandez (Inactive) [ 09/Apr/13 ]

You may be seeing poor reuse of space for the deleted documents.

Can you enable power of 2 allocation with http://docs.mongodb.org/manual/reference/command/collMod/ which should make the space much more reusable after your deletes. If that works better you will want to repair each shard to compact your database. I would suggest turning this on for your new shards now as a test.

The algorithm is also slightly better in 2.4.x so upgrading and using that option may be best if that help with your issue.

Generated at Thu Feb 08 03:20:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.