[SERVER-21107] Improve protocol version 1 replication throughput Created: 23/Oct/15  Updated: 10/Mar/16  Resolved: 19/Nov/15

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Critical - P2
Reporter: Martin Bligh Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File benchRun     HTML File benchSet     HTML File md_exec     HTML File md_rs2     HTML File pymongo_init_rs    
Issue Links:
Depends
depends on SERVER-21129 Allow pipelining of applier work befo... Closed
depends on SERVER-21155 Only parse oplog entries once during ... Closed
depends on SERVER-21229 Group replicated inserts/deletes duri... Closed
depends on SERVER-21250 Improve read throughput of oplog when... Closed
depends on SERVER-23046 Reimplement operation pipelining appl... Closed
Related
related to SERVER-21237 ReplSetTest.prototype.awaitReplicatio... Closed
related to SERVER-21154 Prepare Apply Batches Outside of Appl... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl B (10/30/15), Repl C (11/20/15), Repl D (12/11/15)
Participants:
Linked BF Score: 0

 Description   

On secondaries in PV1, replicated writes require waiting for journaling before updating their oplog position, for correctness. This change has caused performance and throughput to drop relative to the old PV0 replication process, as show below when PV1 became the default for replica sets.

Version, Primary inserts/s, Secondary inserts/s, Ratio

3.1.8        101790 / 259431 = 39.23%
git 21c6a57  100623 / 260348 = 38.64%
git c101632  111042 / 272315 = 40.77%
git bf22ad1   105979 / 268666 = 39.44%
[SERVER-20438: Stash pointer to _uncommittedRecordIds to get O(1) delete]
git 92e38e6  100365 / 306215 = 32.77%
git 0c001db   92363 / 324666 = 28.44%
git 3f273cf    98546 / 328241 = 30.02%
[Apply oplog and record in oplog concurrently]
git 3937e8   130399 / 326500 = 39.93%
git ca4481c  125779 / 333166 = 37.75%
git 1cd101f 129144 / 334833 = 38.56%
[New replica set configurations have protocolVersion=1 by default]
git d789bca 84857 / 338166 = 25.09%
git 4372e7b  81938 / 334853 = 24.47%
[REVERT: Apply oplog and record in oplog concurrently]
git f25e8ac   60469 / 335713 = 18.01%
git 32844e3 62481 / 324500 = 19.25%
 
3.1.9         61854 / 334666 = 18.48%
 
3.2.0-rc0     83754 / 326486 = 25.65%



 Comments   
Comment by Scott Hernandez (Inactive) [ 19/Nov/15 ]

Work was done in the linked issues so closing this umbrella.

Comment by Martin Bligh [ 26/Oct/15 ]

Not suggesting it as a fix, just isolating where the issue is

Comment by Eric Milkie [ 26/Oct/15 ]

We're going to move the waiting to a different codepath to allow for better pipelining in SERVER-21129.

Comment by Eric Milkie [ 26/Oct/15 ]

While that action does make things faster, it's not something we can viably do. It would be like commenting out the part of the code that does the write. It could be way faster if we didn't have to write things...

Comment by Martin Bligh [ 26/Oct/15 ]

Flipping this to false fixes the regression

        const bool mustWaitUntilDurable = shouldEnsureDurability() && replCoord->isV1ElectionProtocol();

Comment by Matt Dannenberg [ 26/Oct/15 ]

This indicates that there is a 35% regression on secondary perf in PV1 compared to PV0. That commit activated a slower path, but it did not create the slowness in the path.

Comment by Martin Bligh [ 23/Oct/15 ]

Repro - see attached scripts:

md_rs2 && benchSet

Comment by Martin Bligh [ 23/Oct/15 ]

matt.dannenberg, scotthernandez, schwerin: looks like the main culprit is:

d789bca4c9fe76cd4d5375e66e281ed5a349e8fd matt.dannenberg@10gen.com SERVER-18498 New replica set configurations have protocolVersion=1 by default

35% regression on secondary perf. With parallel oplog fixed it's actually 50% I think

Generated at Thu Feb 08 03:56:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.