We've been having issues with replication lag.
Had to manually restart one of two slaves and use db.adminCommand(
), let it sync up and then do the same for the other slave. Only then were both able to catch up with the oplog.
Why does it help to restart mongodb for it to start catching up when 1800+ seconds behind? Did not help to set maintenance mode.
What's the bottlenecks of replication?
Does all "repl writer workers" have to wait for the one writer thread to get the oplog replication done?
Did not seem to utilize much of the resources available when the replication was stalled.
Will the maximum capacity to replicate one database depend on how much one cpu core can handle? How do we monitor this limitation?