The 'compact' command should be runnable at low priority on any system without affecting existing performance.
Current 'compact' command restrictions:
- only runs replicas by default, unless forced;
- locks entire collection, preventing any activity until finished.
This case covers the trivial and straightforward case of regular maintenance. That is, the compact command should be runnable at any time, or set up as a background process. It should scan through all chunks, one at a time. The least recently used chunk should be addressed first.
- The least recently used chunk should be mapped in and paged in to a special holding area, and actions performed on this chunk should be -
- all documents sorted in primary/shard-key order;
- documents snugged up against one another or spaced out, per requirements specified in compact command's padding factors;
- OPTIONAL: As a further step, if any of this document's fields are indexed, those index entries should be verified and possibly corrected.
- Processed chunk identifiers should be stored so the same chunks are not repeatedly processed in any n-hour period.
This could be done without affecting performance by querying the locking percentage on the individual shards once per minute, and if the lock percentage is higher than 80%, pause compaction I/O. Alternatively, looking up IOSTAT's IO utilitization percentage will give an idea if the chunk's location can handle more IO.
An additional parameter could be a rate limit on the number of chunks processed per minute/hour, or a max percentage of available IO to use.