-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Minor - P4
-
None
-
Affects Version/s: None
-
Component/s: None
-
Security Level: Public (Available to anyone on the web)
-
Storage Engines - Transactions
-
2,383.357
-
None
-
None
There is an incident of the PS in Helix dev over the weekend of 10/25/25 that resulted in stalled page materialization and stalled log upload. When page materialization is stalled, we expect flowControl from MongoD to kick in to stop new writes. Based on metrics from the Log Service, it looks like not much oplog (averaging 22KB per min, likely periodic no-op?) was appended during the time when page materialization is stalled, which seems like flowControl doing its job. However, from log service metrics (e.g. for this log 1847063560661283), I saw that MongoD still periodically tried to append up to 48MB/min of phylog while the page materialization is completely stalled and it has an interesting sawtooth pattern with a ~6hrs cycle.
From FTDC (attacted), MongoD seems to be sending only 400 bytes of oplog per second but there is 400KB/s of phylog, with phylog:oplog ratio being 1000, which seems a bit off to me.
This ticket is to investigate the large amount of phylog entries for workloads like the above as well as unexpectedly high phylog:oplog ratio.
This example from peter.vertenten@mongodb.com also seems relevant:
Seeing what looks like interesting write amplification for one of the synthetic workloads. for this sls-backup-agent cluster.https://share.zight.com/Apu2KyqbEvery hour the synthetic workload inserts data, the little 10mb spikes like the arrow points to in the screengrab. On the monitoring logs, we see. 20 mb oplog/hr, pre compression.
https://share.zight.com/eDuJWLpZ. This seems to track well enough from oplog perspective.The inserts are to a ttl collection. So even though the writes are every hour we will have continuous deletes to the collection over time. But the curious bit is the sustained phylog bytes being so much higher than the inserted docs.
- is related to
-
SERVER-16247 Oplog declines in performance over time under WiredTiger
-
- Closed
-
-
WT-16084 Design how re-reconciling in memory pages could use less I/O
-
- Open
-