Priority: Major - P3
Affects Version/s: None
Fix Version/s: None
We upgraded a secondary of a 3 node cluster to 3.2.9.
By default when we upgrade we use iptables to allow replication but block clients.
Upon allowing clients cache went up to ~96% and failed to drop. Only 1 (of 16) cores appeared to be in use.
Blocking clients, restarting and allowing replication caused the oplog to catch up but still over time the cache fills and the performance hits rock bottom.
replication status (command took 10-15min to return):
Upon restart (which often takes ages) replication catches up but then the cache fills and the scenario repeats.
Note: Other nodes are running 3.0 still.
I also experimented with changing WT parameters with no joy.
We will downgrade but leaving at 3.2.9 with low priority for now to allow for diagnostics and logs if required.
With 3.0 we still have cache filling issues but they occur once or twice a month, with our workload mmap was pretty much maintenance-free (very stable, minimal issues [except the disk usage], 3.0 WT causes some pain but it's manageable, 3.2 WT is unusable.