Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Won't Do
Priority: Minor - P4
Fix Version/s: None
Affects Version/s: 6.0.19
Component/s: Sharding, TTL
Labels:

Operating System:
ALL
Confidence Status:
None
Work Order:
3
Size Category:
TBD
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We have a sharded cluster with really heavy write load, regularly spiking to 100% CPU. All written documents once are processed and are subject for removal by TTL index.

But sometimes some primary shards become so heavily loaded that background TTL removal process lags behind, and the number of documents for pending removal grows. Initially all documents are perfectly evenly distributed across shards, but once TTL process starts working with different performance on different shards, balancer realizes that it can help rebalance large shards.

And actually it starts making things even worse. Balancer chooses shards with large number of ducuments and tries to move them to shards with smaller number of documents. But as we know this imbalance solely created by the degraded TTL removal performance, and balancer activity creates even more load and contention on already heavily loaded shards.

We fixed the problem setting up activity window for balancer for time when the load is relatively low. But maybe it's possible to pause balancer activity if it sees that TTL removal backlog is considerably large?

Assignee:: Kenan Ali
Reporter:: Pavel Miasnov
Participants:: Kenan Ali, Pavel Miasnov
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: May 07 2025 02:34:43 PM UTC
Updated:: Jun 05 2025 02:09:30 PM UTC
Resolved:: Jun 05 2025 02:09:30 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates