|
Resolving this ticket as working as designed. Per my initial comment, in my benchmarks with 1kB-sized documents, the batched deleter shows 3x the throughput of the doc-by-doc deleter, and lower impact (in terms of latency of concurrent w:majority writes) than running 3 doc-by-doc mass deletions in parallel. There is no evidence of a performance bottleneck that is specific to a secondary member.
|
|
Turns out that the behaviour I was observing is not specific to secondaries, it's just more apparent on secondaries because of replication rules. Explanation below.
On the primary, the concurrent inserts can interleave freely with the mass document removal, so if storage is bottlenecked on read path (e.g. during a checkpoint per my previous comment), the remove ops would be bound by storage although the inserts can progress with no impact.
On a secondary however, the same storage-bound scenario does impact the inserts: writes must replicate in the same order (across batches) as they executed on the primary. So if the system is read-storage bound, the inserts must wait on the outstanding deletes, and this in turn impacts the rate at which inserts are replicated. This is why the latency of majority write concern inserts increases during checkpoints on secondaries.
This behaviour holds up with larger documents (16kB) and in the presence of secondary indexes. I haven't seen evidence of the IDHACK fetch stage being problematic.
|
|
With 1kB sized documents, the batch deleter shows ~3x the throughput of the doc-by-doc deleter. And it's also faster than 3 doc-by-doc mass deletions occurring in parallel.
In terms of latencies of concurrent w:majority inserts:
| Concurrent w:majority inserts |
p50 (ms) |
p99 (ms) |
max (ms) |
| During doc-by-doc mass deletion |
5.6 |
7.6 |
25 |
| During 3x doc-by-doc mass deletion |
15.7 |
42.2 |
2603 |
| During batched deletion |
12 |
33.7 |
2103 |
The tail latency is due to secondary replication. This seems to be a consequence of the primary deleting documents at a higher rate than before, and it's still lower than the tail latency of running multiple doc-by-doc deletions in parallel.
The chart below shows one such latency spike between A and B. Off-CPU analysis (blue metrics in the "Thread profile" section) reveals the latency is due to a checkpoint's fdatasync on the secondary which saturates the disk and impacts fetching a document for deletion.

It would be interesting to determine why this behaviour is only observed on secondaries, although I'm not sure that is in scope for ticket. Relevant questions would be whether the checkpoints on secondaries are less efficient than on the primary, and whether the document fetches when replicating writes (IDHACK) are less efficient than the ranged fetches that occur on the primary.
|