[DOCS-9559] Docs for SERVER-10443: Compact command with LOW priority Created: 05/Dec/16  Updated: 24/Jan/17  Resolved: 24/Jan/17

Status: Closed
Project: Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Emily Hall Assignee: Allison Reinheimer Moore
Resolution: Won't Fix Votes: 0
Labels: compaction, indexing, performance, sharding, storage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-10443 Compact command with LOW priority Closed
Participants:
Days since reply: 7 years, 3 weeks, 1 day ago

 Description   

Engineering Ticket Description:

The 'compact' command should be runnable at low priority on any system without affecting existing performance.

Current 'compact' command restrictions:

  • only runs replicas by default, unless forced;
  • locks entire collection, preventing any activity until finished.

This case covers the trivial and straightforward case of regular maintenance. That is, the compact command should be runnable at any time, or set up as a background process. It should scan through all chunks, one at a time. The least recently used chunk should be addressed first.

Processing:

  • The least recently used chunk should be mapped in and paged in to a special holding area, and actions performed on this chunk should be -
  • all documents sorted in primary/shard-key order;
  • documents snugged up against one another or spaced out, per requirements specified in compact command's padding factors;
  • OPTIONAL: As a further step, if any of this document's fields are indexed, those index entries should be verified and possibly corrected.
  • Processed chunk identifiers should be stored so the same chunks are not repeatedly processed in any n-hour period.

This could be done without affecting performance by querying the locking percentage on the individual shards once per minute, and if the lock percentage is higher than 80%, pause compaction I/O. Alternatively, looking up IOSTAT's IO utilitization percentage will give an idea if the chunk's location can handle more IO.

An additional parameter could be a rate limit on the number of chunks processed per minute/hour, or a max percentage of available IO to use.



 Comments   
Comment by Kay Kim (Inactive) [ 24/Jan/17 ]

Feel free to reopen if you think this needs documentation.

Generated at Thu Feb 08 07:58:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.