In order to support multithreaded replication, we apply operations in batches on secondary nodes. The batching enables us to guarantee that read queries on secondaries do not see operations applied out of order.
Each batch is divided up among many threads, and while they are writing, we block all readers, even if the writing threads yield. The multithreaded writing allows for concurrent CPU usage.
In addition, before the batched writing begins, we prefetch all the memory pages which contain records we are about to write, including the pages we need to traverse in all indexes. This prefetch stage provides concurrency for IO, and is probably providing the majority of the speedup that users are seeing. It also allows us to hold the write lock for a minimal amount of time, since there should not be any page faults taken in the write phase.