I have a 3-node replica set running version 3.4.10 on Ubuntu 16.04.
I ran a schema update that touched all 7 million rows of a collection with a $set and a $rename. Because one of the secondaries is about 30ms away in Azure, I used majority write concern to slow down the update and make sure at least one of the secondaries would stay in sync.
The query started at 14:19:29. At that point the Azure slave was probably 3-5 minutes behind because of earlier schema migrations. But by 14:27:00, the main secondary was unable to get results for oplog queries:
That's also the time the replica set stopped accepting connections from clients.
To get things running again I had to kill all three mongod processes (and then kill -9 because the shutdown tends to hang while in this state).
After letting the nodes sync up, I was able to reproduce this again with the same query.
I can provide logs and the query privately if that would be useful.
Just guessing based on what I learned in
SERVER-32398, maybe the primary froze up because it ran out of cache while waiting for the secondary to apply changes. But the update was running with majority read concern so I would have thought the secondary couldn't have gotten far enough behind for that to occur.