We had an issue in production yesturday:
- at ~16:31 UTC undetected queries starts to intensively read cold data. Read form disks greatly increased, as well as eviction from mongo's page cache. This unexpected load lasts till ~17:12.
The load is not issue by itself remarkable for report here. But it leads to following bad behavour:
- at 16:42:44 mongod literally freezes. It doesn't respond to any thing, doesn't write anything to log, doesn't send statistic till 16:43:06.
I believe it is bug.
I'm attaching diagnostic data file that covers that period, and screenshot of monitoring for long period (16:20-17:30) and focused on issue (short 16:40-16:45)
Environment: aws i3.x16large, mondo data path is placed on lvm volume over two NVMe devices.