[SERVER-18908] Secondaries unable to keep up with primary under WiredTiger Created: 10/Jun/15 Updated: 31/Oct/16 Resolved: 28/Oct/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, WiredTiger |
| Affects Version/s: | 3.0.3, 3.0.4 |
| Fix Version/s: | 3.4.0-rc2 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Mathias Stearn |
| Resolution: | Done | Votes: | 11 |
| Labels: | WTcap | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Quint Iteration 7, QuInt A (10/12/15), QuInt C (11/23/15), Repl 2016-08-29, Repl 2016-09-19, Repl 2016-10-10, Repl 2016-10-31 | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Description |
Replica lag grows unbounded as secondaries process ops at maybe 50-80% the rate of the primary. Some stats of note: primary
secondary
Will get stack traces. |
| Comments |
| Comment by Mathias Stearn [ 28/Oct/16 ] | |||||
|
Significant work on secondary performance was done under | |||||
| Comment by Jinwen Zou [ 29/Dec/15 ] | |||||
|
It would be very appreciated if you can share how 3.2.0 was improved in pipelining and parallelism(some implementation details). To be honest, it would be very good show cases or new features for marketing. Beside, serious applications/shops will always 1-2 major release behind the bleeding-edge release, backporting these improvements to mongo 3.0.x is highly demanded in many shops. | |||||
| Comment by Martin Bligh [ 15/Dec/15 ] | |||||
|
The code has been changed significantly since that comment was written. | |||||
| Comment by Jinwen Zou [ 15/Dec/15 ] | |||||
This is very inefficient implementation of replication. how about split up the replication task to multiple threads with minimal sequential operations: ex: | |||||
| Comment by Githook User [ 14/Sep/15 ] | |||||
|
Author: {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}Message: | |||||
| Comment by Bruce Lucas (Inactive) [ 15/Jun/15 ] | |||||
|
I've opened | |||||
| Comment by Bruce Lucas (Inactive) [ 15/Jun/15 ] | |||||
|
The number of replication worker threads on the secondary is hard-coded at 16. This can become a bottleneck creating replication lag if the primary has more concurrency than that. For example, with 12 application threads, the following replication lags were measured at the end of a 110-second run for varying numbers of replication worker threads:
To address this issue we can implement | |||||
| Comment by Bruce Lucas (Inactive) [ 15/Jun/15 ] | |||||
|
Did some more careful measurements evaluating two changes:
In an insert workload taking about 90 seconds:
So it appears that a combination of both patches, if fully implemented, would address this issue. (Note that these results were obtained with 8 application threads. Since the number of replication worker threads is hard-coded at 16, in this situation the time spent inserting documents into the collection may actually be less on the secondary than it was on the primary. This means that although the two patches described eliminated growing replication lag in this case, the better performance on the secondary for the inserts than on the primary may be hiding other inefficiencies on the secondary.) | |||||
| Comment by Bruce Lucas (Inactive) [ 12/Jun/15 ] | |||||
|
On the primary, each connection thread writes its own ops to the oplog, which proceed in parallel under WT with document-level concurrency. On the secondary, writing to the oplog is done sequentially by the sync thread, making it a serial bottleneck that doesn't exist on the primary. Under mmapv1 with collection-level locking, writing to the oplog is a serial bottlenck on both the primary and the secondary. In the following 24 samples, runSyncThread spent 67% of its time in applyOps waiting for the n worker threads to finish applying the ops, and 29% of its time writing those ops to the oplog sequentially. This is about the right ratio to account for the performance difference we see between primary and secondary.
| |||||
| Comment by Eric Milkie [ 10/Jun/15 ] | |||||
|
One idea we had is that instead of doing upserts for inserts, we could simply do real inserts and ignore duplicate key errors. The idempotency logic should still work. | |||||
| Comment by Bruce Lucas (Inactive) [ 10/Jun/15 ] | |||||
|
adapted test to mmpav1: 8 threads inserting into 8 collections
The mmapv1 secondary did appear slightly slower, 5% at most from eyeballing mongostat | |||||
| Comment by Bruce Lucas (Inactive) [ 10/Jun/15 ] | |||||
|
The search near appers to be because on the secondary we do inserts as upserts to make oplog replay idempotent, and that requires a lookup of the _id. If this is the cause (TBD whether it is) could be related to
| |||||
| Comment by Eric Milkie [ 10/Jun/15 ] | |||||
|
What does the lag look like for the same parameters except using mmapv1? |