|
Attachments:
|
5dc08364e3c3317a73ff6fa6,enterprise-rhel-62-64-bit,be045feaf3ba4af8037c5baceda2b15cd2498a24,initiateWithNodeZeroAsPrimary.png
5dc08364e3c3317a73ff6fa6,enterprise-rhel-62-64-bit,be045feaf3ba4af8037c5baceda2b15cd2498a24.png
5de952fee3c33113884c24cc,enterprise-rhel-62-64-bit,3232fb0d0ada81eaee63c02887f6639284da1905,initiateWithNodeZeroAsPrimary.png
5de952fee3c33113884c24cc,enterprise-rhel-62-64-bit,3232fb0d0ada81eaee63c02887f6639284da1905.png
5de98193e3c33113884ffb22,enterprise-rhel-62-64-bit,f5a2d477761f5d954ea63a8c8a6cfa02d124e4a7.png
5de9d266e3c3311388563bce,enterprise-rhel-62-64-bit,f5a2d477761f5d954ea63a8c8a6cfa02d124e4a7.png
5dea7681e3c3312ae9e13d13,enterprise-rhel-62-64-bit,13944bb3fedc8d91c02c56bb66bb5c76a0a558d0.png
5debc0f9e3c3312ae9e8ff18,enterprise-rhel-62-64-bit,13944bb3fedc8d91c02c56bb66bb5c76a0a558d0,initiateWithNodeZeroAsPrimary.png
5debc0f9e3c3312ae9e8ff18,enterprise-rhel-62-64-bit,13944bb3fedc8d91c02c56bb66bb5c76a0a558d0.png
5deea767e3c331527a90dddb,enterprise-rhel-62-64-bit,4f00d6e3d7e54a2d131b8216d4b4ef9bfd7a7c6b,startupAndInitiate.png
5deec43ae3c331527a946440,enterprise-rhel-62-64-bit,e79e2a5b15d17fe4b1838c0087dc722e3a6e454c,startupAndInitiate.png
5deec43ae3c331527a946440,enterprise-rhel-62-64-bit,e79e2a5b15d17fe4b1838c0087dc722e3a6e454c,startupAndInitiate.png
5deff7e2e3c331637cd5fcf1,enterprise-rhel-62-64-bit,4f00d6e3d7e54a2d131b8216d4b4ef9bfd7a7c6b,initiateWithNodeZeroAsPrimary.png
|
|
Issue Links:
|
Problem/Incident
|
|
causes
|
SERVER-45765
|
Race in ReplSetTest.initiateWithAnyNo...
|
Closed
|
Related
|
|
is related to
|
SERVER-45151
|
Skip call to awaitNodesAgreeOnApplied...
|
Closed
|
|
is related to
|
SERVER-47691
|
ReplSetTest should be resilient to un...
|
Closed
|
|
is related to
|
SERVER-43774
|
ShardingTest should initiate all of i...
|
Closed
|
|
| Backwards Compatibility: |
Fully Compatible
|
| Sprint: |
Repl 2019-12-02, Repl 2019-12-16 |
| Participants: |
|
| Linked BF Score: |
48
|
1. Is num_nodes the number per shard, or the total number?
'num_nodes' is the total number of shard nodes, as reported here. It's not an entirely accurate metric since it doesn't include config server size and it only counts the total number of shard nodes, regardless of shard replica set size. For example, for 2 shards of 3 nodes 'num_nodes' would appear as 6, as would 3 shards of 2 nodes each.
2. How many shards is this split over?
I think my answer to (1) should address this, if I understand the question correctly. The metrics I currently collect do not include the number of shards in the test. I previously tracked this but switched to just track the total number of nodes.
3. Why do the legends contain different numbers of nodes?
Good question. Something I forgot to point out is that 'num_nodes' actually has a different meaning in each chart. In the "before" chart 'num_nodes' represents the number of shards in the test, whereas in the "after" chart it represents the total number of shard nodes, as explained in my answer to (1). Sorry, I know this is confusing, but it's a side effect of the fact that the metrics collection strategy changed a bit over the course of the project. If you ignore the coloring of bars entirely, then the charts can be directly compared.
In hindsight, it might have been better to just stick with measuring the number of shards instead of total shard nodes, since we now expect that startup and initiation time of multi-node replica sets shouldn't be much worse that a single node replica set, so we would probably care more about correlating shard count to execution times.
|
|
william.schultz, in the above chart:
- Is num_nodes the number per shard, or the total number?
- How many shards is this split over?
- Why do the legends contain different numbers of nodes?
|
|
We can also view the overall performance improvement achieved for ShardingTest startup and initiation performance in the 'sharding' suite after these changes were committed. The comparison below show the time it takes to start up and initiate a sharded cluster in ShardingTest before any performance improvements from PM-1360 were committed, and the performance after this ticket was done.
Before
After
We see the 95th percentile has dropped from 17328ms to 3966ms, a reduction of 14,642ms.
|
|
The overall performance improvement achieved for ReplSetTest performance in the 'replica_sets' suite after the final commit is summarized below.
Before
After
We see a drop in the 95th percentile from 6336ms to 824ms.
|
|
Author:
{'email': 'william.schultz@mongodb.com', 'name': 'William Schultz', 'username': 'will62794'}
Message: SERVER-43766 Additional optimizations for ReplSetTest initiate speed
Branch: master
https://github.com/mongodb/mongo/commit/440ccdca4de3730c9eac22ce68edb055638cd352
|
|
After adding further optimizations which include reducing this bgsync sleep and waiting for replication inside initiateWithAnyNodeAsPrimary before we turn up the heartbeat interval again, here are the initial improvements over the original commit:
Before:
After:
The 95th percentile shows a significant improvement.
|
|
This is an additional Evergreen run with 1 resmoke job and a reduced bgsync sleep:
And here is another run where we reduce the bgsync sleep and also awaitReplication and wait for nodes to agree on their view of the replica set before turning up the heartbeat interval again:
|
|
Upon some further inspection, it appears that tests that disable chaining (e.g. step_down_during_draining2.js) may be the culprit for the slower bunch of 3 node tests that lie a bit above the 1 second duration point. If we reduce this sleep in bgsync.cpp and run a new patch, we get the following profile:
This appears to bring those 3 node tests into the lower, sub 1 second cluster of tests. There is still a small cluster of 2 and 3 node tests that lie above the 2.5 second mark, and it would be worth doing further investigation to see if that is just noise or there is some other issue making all of those tests slower.
|
|
As judah.schvimer pointed out, the standard deviation even after the changes is still somewhat high (2281ms). This may be skewed somewhat by the outliers in the data. If we take the mean and standard deviation of the lowest 95%, 85%, and 75% of the data, we see the following stats, which give a bit more accurate picture of the variance in the data:
Lowest 95%
mean: 832ms
stddev: 704ms
Lowest 85%
mean: 638ms
stddev: 413ms
Lowest 75%
mean: 514ms
stddev: 246ms
As discussed offline, there is still an apparent correlation between initiate time and replica set size. My thought is that this may have to do with the slowness of the stepUp function inside initiateWithNodeZeroAsPrimary. I may consider trying to speed this up in a follow up patch.
Also, here is a histogram from a sequential Evergreen run i.e. only using 1 resmoke job. This data appears to have less outlier noise:
|
|
Here are initial measurements of the performance impact of these changes. Here are histograms showing initiateWithNodeZeroAsPrimary durations before the change:
and these are the associated summary stats for initiateWithNodeZeroAsPrimary duration:
median=3477ms
mean=3955ms
stddev=2338ms
95th percentile: 6335ms
75th percentile: 5281ms
50th percentile: 3477ms
Here is the duration histogram after the changes:
along with the summary stats:
median=641ms
mean=1196ms
stddev=2281ms
95th percentile: 3685ms
75th percentile: 1288ms
50th percentile: 641ms
To more accurately measure the improvements here, I would still like to run a full sequential test run, to measure single threaded test execution times.
|
|
Author:
{'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}
Message: SERVER-43766 Speed up initial syncs and awaitLastStableRecoveryTimestamp in ReplSetTest initiate
Branch: master
https://github.com/mongodb/mongo/commit/f5a2d477761f5d954ea63a8c8a6cfa02d124e4a7
|
|
Author:
{'email': 'william.schultz@mongodb.com', 'name': 'William Schultz', 'username': 'will62794'}
Message: SERVER-43766 Prevent use of failpoints in ReplSetTest when using random binary versions for nodes
Branch: master
https://github.com/mongodb/mongo/commit/a0831c75a97b935f30c2987a67a3ee6c92fab49c
|
|
Author:
{'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}
Message: SERVER-43766 Speed up primary drain mode in ReplSetTest initiate
Branch: master
https://github.com/mongodb/mongo/commit/2e948c4e94b17089ab56a5437447f9988c31103d
|
initiate awaitLastStableRecoveryTimestamp (1149ms for 5 nodes)
If we look at what one of the nodes is doing between the completion of awaitSecondaryNodes and the completion of awaitLastStableRecoveryTimestamp, it again appears that sync source selection is slowing things down. We start waiting for a stable recovery timestamp at 25:32.607, and node d21523, a secondary, does not pick a sync source until 25:33.535, almost a full second later. Shortly after we finish waiting for the recovery timestamp and we see that the 'appendOplogNote' completed with a duration of 920ms:
[js_test:replsettest_control_1_node] 2019-11-20T17:25:32.579+0000 ReplSetTest initiate awaitSecondaryNodes took 1237ms for 5 nodes.
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:32.607+0000 AwaitLastStableRecoveryTimestamp: Beginning for [
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:32.634+0000 AwaitLastStableRecoveryTimestamp: ensuring the commit point advances for [
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:33.535+0000 d21523| 2019-11-20T17:25:33.534+0000 I REPL [BackgroundSync] sync source candidate: ip-10-122-75-91:21520
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:33.536+0000 d21523| 2019-11-20T17:25:33.535+0000 I REPL [BackgroundSync] Changed sync source from empty to ip-10-122-75-91:21520
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:33.539+0000 d21520| 2019-11-20T17:25:33.539+0000 I COMMAND [conn1] command admin.$cmd appName: "MongoDB Shell" command: appendOplogNote { appendOplogNote: 1.0, data: { awaitLastStableRecoveryTimestamp: 1.0 }, writeConcern: { w: "majority", wtimeout: 600000.0 }, lsid: { id: UUID("5a39fc0b-d164-49ca-9fe6-f62eda611178") }, $clusterTime: { clusterTime: Timestamp(1574270731, 6), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $readPreference: { mode: "secondaryPreferred" }, $db: "admin" } numYields:0 reslen:163 locks:{ ParallelBatchWriterMode: { acquireCount: { r: 1 } }, ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { w: 1 } } } flowControl:{ acquireCount: 1 } storage:{} protocol:op_msg 920ms
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:33.547+0000 AwaitLastStableRecoveryTimestamp: Waiting for stable recovery timestamps for [
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:33.547+0000 AwaitLastStableRecoveryTimestamp: ip-10-122-75-91:21521 does not have a stable recovery timestamp yet.
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:33.751+0000 AwaitLastStableRecoveryTimestamp: A stable recovery timestamp has successfully established on [
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:33.751+0000 ReplSetTest initiate awaitLastStableRecoveryTimestamp took 1149ms for 5 nodes.
|
This seems to implicate one of the 1 second sleeps scattered through bgsync.cpp. If we reduce all of these sleeps to 100 milliseconds, we can reduce the duration of this section somewhat, as seen in this patch:
[js_test:replsettest_control_1_node] 2019-11-20T18:21:45.071+0000 ReplSetTest initiate awaitSecondaryNodes took 1236ms for 5 nodes.
|
[js_test:replsettest_control_1_node] 2019-11-20T18:21:45.534+0000 d21771| 2019-11-20T18:21:45.533+0000 I REPL [BackgroundSync] sync source candidate: ip-10-122-75-130:21770
|
[js_test:replsettest_control_1_node] 2019-11-20T18:21:45.535+0000 d21771| 2019-11-20T18:21:45.534+0000 I REPL [BackgroundSync] Changed sync source from empty to ip-10-122-75-130:21770
|
[js_test:replsettest_control_1_node] 2019-11-20T18:21:45.540+0000 d21773| 2019-11-20T18:21:45.539+0000 I REPL [BackgroundSync] sync source candidate: ip-10-122-75-130:21771
|
[js_test:replsettest_control_1_node] 2019-11-20T18:21:45.541+0000 d21773| 2019-11-20T18:21:45.541+0000 I REPL [BackgroundSync] Changed sync source from empty to ip-10-122-75-130:21771
|
[js_test:replsettest_control_1_node] 2019-11-20T18:21:45.547+0000 d21774| 2019-11-20T18:21:45.546+0000 I REPL [BackgroundSync] sync source candidate: ip-10-122-75-130:21773
|
[js_test:replsettest_control_1_node] 2019-11-20T18:21:45.548+0000 d21774| 2019-11-20T18:21:45.548+0000 I REPL [BackgroundSync] Changed sync source from empty to ip-10-122-75-130:21773
|
[js_test:replsettest_control_1_node] 2019-11-20T18:21:45.642+0000 d21772| 2019-11-20T18:21:45.642+0000 I REPL [BackgroundSync] sync source candidate: ip-10-122-75-130:21771
|
[js_test:replsettest_control_1_node] 2019-11-20T18:21:45.644+0000 d21772| 2019-11-20T18:21:45.643+0000 I REPL [BackgroundSync] Changed sync source from empty to ip-10-122-75-130:21771
|
[js_test:replsettest_control_1_node] 2019-11-20T18:21:45.758+0000 ReplSetTest initiate awaitLastStableRecoveryTimestamp took 667ms for 5 nodes
|
This a 482ms over the baseline. Initial sync source selection still seems to take several 100 milliseconds, though, even with the reduce bgsync.cpp sleeps i.e. consider the timestamp different between the "ReplSetTest initiate awaitSecondaryNodes took 1236ms" message and the first "sync source candidate: ip-10-122-75-130:21770" message from node d21771. This seemed to suggest that slow heartbeats may prevent the nodes from receiving the information they need to pick a sync source sooner. If we run a new patch build with both reduced bgsync sleeps and a heartbeat interval of 500ms, we see greater improvements:
[js_test:replsettest_control_1_node] 2019-11-20T20:48:32.911+0000 ReplSetTest initiate awaitSecondaryNodes took 1442ms for 5 nodes.
|
[js_test:replsettest_control_1_node] 2019-11-20T20:48:32.938+0000 AwaitLastStableRecoveryTimestamp: Beginning for [
|
[js_test:replsettest_control_1_node] 2019-11-20T20:48:32.950+0000 AwaitLastStableRecoveryTimestamp: ensuring the commit point advances for [
|
[js_test:replsettest_control_1_node] 2019-11-20T20:48:33.087+0000 d20523| 2019-11-20T20:48:33.087+0000 I REPL [BackgroundSync] sync source candidate: ip-10-122-70-126:20520
|
[js_test:replsettest_control_1_node] 2019-11-20T20:48:33.089+0000 d20521| 2019-11-20T20:48:33.089+0000 I REPL [BackgroundSync] sync source candidate: ip-10-122-70-126:20520
|
[js_test:replsettest_control_1_node] 2019-11-20T20:48:33.090+0000 d20524| 2019-11-20T20:48:33.090+0000 I REPL [BackgroundSync] sync source candidate: ip-10-122-70-126:20520
|
[js_test:replsettest_control_1_node] 2019-11-20T20:48:33.093+0000 d20520| 2019-11-20T20:48:33.093+0000 I COMMAND [conn1] command admin.$cmd appName: "MongoDB Shell" command: appendOplogNote { appendOplogNote: 1.0, data: { awaitLastStableRecoveryTimestamp: 1.0 }, writeConcern: { w: "majority", wtimeout: 600000.0 }, lsid: { id: UUID("bfbac1a0-5327-4542-9f2c-9d56b7ae7104") }, $clusterTime: { clusterTime: Timestamp(1574282911, 6), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $readPreference: { mode: "secondaryPreferred" }, $db: "admin" } numYields:0 reslen:163 locks:{ ParallelBatchWriterMode: { acquireCount: { r: 1 } }, ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { w: 1 } } } flowControl:{ acquireCount: 1 } storage:{} protocol:op_msg 142ms
|
[js_test:replsettest_control_1_node] 2019-11-20T20:48:33.093+0000 AwaitLastStableRecoveryTimestamp: Waiting for stable recovery timestamps for [
|
[js_test:replsettest_control_1_node] 2019-11-20T20:48:33.094+0000 d20522| 2019-11-20T20:48:33.094+0000 I REPL [BackgroundSync] sync source candidate: ip-10-122-70-126:20520
|
[js_test:replsettest_control_1_node] 2019-11-20T20:48:33.104+0000 AwaitLastStableRecoveryTimestamp: A stable recovery timestamp has successfully established on [
|
[js_test:replsettest_control_1_node] 2019-11-20T20:48:33.104+0000 ReplSetTest initiate awaitLastStableRecoveryTimestamp took 168ms for 5 nodes.
|
This reduces the execution time to 168ms, a 982ms improvement over the baseline.
|
initiate awaitSecondaryNodes (1237ms for 5 nodes)
If we look at what nodes are doing between the completion of the 'initiate command' and the completion of 'awaitSecondaryNodes', we can see that initial sync of each node becomes a candidate for what is slowing things down:
[js_test:replsettest_control_1_node] 2019-11-20T17:25:31.333+0000 ReplSetTest initiate command took 1066ms for 5 nodes.
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:31.398+0000 d21522| 2019-11-20T17:25:31.397+0000 D1 INITSYNC [ReplCoordExtern-0] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 1000ms at 2019-11-20T17:25:32.397+0000. Attempt 1 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:31.403+0000 d21521| 2019-11-20T17:25:31.403+0000 D1 INITSYNC [ReplCoordExtern-0] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 1000ms at 2019-11-20T17:25:32.403+0000. Attempt 1 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:31.403+0000 d21523| 2019-11-20T17:25:31.403+0000 D1 INITSYNC [ReplCoordExtern-0] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 1000ms at 2019-11-20T17:25:32.403+0000. Attempt 1 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:31.403+0000 d21524| 2019-11-20T17:25:31.403+0000 D1 INITSYNC [ReplCoordExtern-0] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 1000ms at 2019-11-20T17:25:32.403+0000. Attempt 1 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:32.397+0000 d21522| 2019-11-20T17:25:32.397+0000 I REPL [ReplCoordExtern-1] sync source candidate: ip-10-122-75-91:21520
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:32.403+0000 d21524| 2019-11-20T17:25:32.403+0000 I REPL [ReplCoordExtern-1] sync source candidate: ip-10-122-75-91:21520
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:32.403+0000 d21523| 2019-11-20T17:25:32.403+0000 I REPL [ReplCoordExtern-1] sync source candidate: ip-10-122-75-91:21520
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:32.403+0000 d21521| 2019-11-20T17:25:32.403+0000 I REPL [ReplCoordExtern-1] sync source candidate: ip-10-122-75-91:21520
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:32.521+0000 d21522| 2019-11-20T17:25:32.521+0000 I INITSYNC [ReplCoordExtern-0] initial sync done; took 1s.
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:32.532+0000 d21524| 2019-11-20T17:25:32.532+0000 I INITSYNC [ReplCoordExtern-0] initial sync done; took 1s.
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:32.532+0000 d21521| 2019-11-20T17:25:32.532+0000 I INITSYNC [ReplCoordExtern-2] initial sync done; took 1s.
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:32.533+0000 d21523| 2019-11-20T17:25:32.533+0000 I INITSYNC [ReplCoordExtern-2] initial sync done; took 1s.
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:32.579+0000 ReplSetTest initiate awaitSecondaryNodes took 1237ms for 5 nodes.
|
After we see the "Error getting sync source" message from each node, each node takes 1 second until it logs a "sync source candidate" message. This seems to implicate this hard-coded initial sync timeout, which is set at 1 second. If we run another patch build with this interval reduced to 50 milliseconds, we slightly improve the duration here, but not drastically:
[js_test:replsettest_control_1_node] 2019-11-20T18:17:24.344+0000 ReplSetTest initiate command took 1083ms for 5 nodes.
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:24.408+0000 d22021| 2019-11-20T18:17:24.408+0000 D1 INITSYNC [ReplCoordExtern-0] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 50ms at 2019-11-20T18:17:24.458+0000. Attempt 1 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:24.458+0000 d22021| 2019-11-20T18:17:24.458+0000 D1 INITSYNC [ReplCoordExtern-1] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 50ms at 2019-11-20T18:17:24.508+0000. Attempt 2 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:24.508+0000 d22021| 2019-11-20T18:17:24.508+0000 D1 INITSYNC [ReplCoordExtern-0] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 50ms at 2019-11-20T18:17:24.558+0000. Attempt 3 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:24.558+0000 d22021| 2019-11-20T18:17:24.558+0000 D1 INITSYNC [ReplCoordExtern-1] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 50ms at 2019-11-20T18:17:24.608+0000. Attempt 4 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:24.608+0000 d22021| 2019-11-20T18:17:24.608+0000 D1 INITSYNC [ReplCoordExtern-0] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 50ms at 2019-11-20T18:17:24.658+0000. Attempt 5 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:24.658+0000 d22021| 2019-11-20T18:17:24.658+0000 D1 INITSYNC [ReplCoordExtern-1] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 50ms at 2019-11-20T18:17:24.708+0000. Attempt 6 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:24.708+0000 d22021| 2019-11-20T18:17:24.708+0000 D1 INITSYNC [ReplCoordExtern-0] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 50ms at 2019-11-20T18:17:24.758+0000. Attempt 7 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:24.758+0000 d22021| 2019-11-20T18:17:24.758+0000 D1 INITSYNC [ReplCoordExtern-1] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 50ms at 2019-11-20T18:17:24.808+0000. Attempt 8 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:24.808+0000 d22021| 2019-11-20T18:17:24.808+0000 D1 INITSYNC [ReplCoordExtern-0] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 50ms at 2019-11-20T18:17:24.858+0000. Attempt 9 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:24.858+0000 d22021| 2019-11-20T18:17:24.858+0000 D1 INITSYNC [ReplCoordExtern-1] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 50ms at 2019-11-20T18:17:24.908+0000. Attempt 10 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:24.908+0000 d22021| 2019-11-20T18:17:24.908+0000 I REPL [ReplCoordExtern-0] sync source candidate: ip-10-122-76-160:22020
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:25.079+0000 d22021| 2019-11-20T18:17:25.078+0000 I INITSYNC [ReplCoordExtern-0] initial sync done; took 0s.
|
[js_test:replsettest_control_1_node] 2019-11-20T18:17:25.182+0000 ReplSetTest initiate awaitSecondaryNodes took 830ms for 5 nodes.
|
We see that even with a reduced retry interval, a node (d22021) is still stuck waiting to find a sync source for a while (~500ms). This implies that the node may not be able to pick a valid sync source because it has not updated its view of other nodes in the replica set yet. If we reduce the heartbeat interval to 500ms in addition to reducing the initial sync retry intervals to 50ms in another patch, we see that a node can find a sync source more quickly, and the overall time of this section is reduced further:
[js_test:replsettest_control_1_node] 2019-11-20T20:47:42.220+0000 ReplSetTest initiate command took 1080ms for 5 nodes.
|
[js_test:replsettest_control_1_node] 2019-11-20T20:47:42.292+0000 d22271| 2019-11-20T20:47:42.291+0000 D1 INITSYNC [ReplCoordExtern-0] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 50ms at 2019-11-20T20:47:42.341+0000. Attempt 1 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T20:47:42.341+0000 d22271| 2019-11-20T20:47:42.341+0000 D1 INITSYNC [ReplCoordExtern-1] Error getting sync source: 'InvalidSyncSource: No valid sync source available. Our last fetched optime: { ts: Timestamp(0, 0), t: -1 }', trying again in 50ms at 2019-11-20T20:47:42.391+0000. Attempt 2 of 60
|
[js_test:replsettest_control_1_node] 2019-11-20T20:47:42.391+0000 d22271| 2019-11-20T20:47:42.391+0000 I REPL [ReplCoordExtern-0] sync source candidate: ip-10-122-73-89:22270
|
[js_test:replsettest_control_1_node] 2019-11-20T20:47:42.540+0000 d22271| 2019-11-20T20:47:42.540+0000 I INITSYNC [ReplCoordExtern-0] initial sync done; took 0s.
|
[js_test:replsettest_control_1_node] 2019-11-20T20:47:42.647+0000 ReplSetTest initiate awaitSecondaryNodes took 418ms for 5 nodes.
|
The execution time here of 418ms is a 819ms reduction over the baseline. Initial sync itself may be the limiting factor in reducing this duration further, but it might be worth a bit more investigation. We should also run these tests a few times to make sure we're not measuring noise.
|
initiate command (1066ms for 5 nodes)
In this case it's pretty easy to locate the offending log messages:
[js_test:replsettest_control_1_node] 2019-11-20T17:25:30.316+0000 d21520| 2019-11-20T17:25:30.316+0000 I REPL [ReplCoord-1] Stopping replication producer
|
[js_test:replsettest_control_1_node] 2019-11-20T17:25:31.313+0000 d21520| 2019-11-20T17:25:31.313+0000 I REPL [ReplBatcher] Oplog buffer has been drained in term 1
|
This seems a clear indicator that this 1 second sleep is slowing the initiate command down. We can test this theory by running a new patch build with that sleep reduced from 1 second to 50 milliseconds. In that patch, we see the initiate command duration significantly reduced:
[js_test:replsettest_control_1_node] 2019-11-20T18:12:36.601+0000 ReplSetTest initiate command took 279ms for 5 nodes.
|
and it seems clear that the primary drain speed had a large impact on this:
[js_test:replsettest_control_1_node] 2019-11-20T18:12:36.398+0000 d22770| 2019-11-20T18:12:36.396+0000 I REPL [ReplCoord-0] Stopping replication producer
|
[js_test:replsettest_control_1_node] 2019-11-20T18:12:36.398+0000 d22770| 2019-11-20T18:12:36.396+0000 I REPL [ReplBatcher] Oplog buffer has been drained in term 1
|
This is a speed improvement of 787ms over the baseline. If we run another patch build with additional logging, we can see what is contributing to the remaining execution time:
[js_test:replsettest_control_1_node] 2019-11-25T04:45:58.425+0000 ReplSetTest initiate getPrimary took 210ms for 5 nodes.
|
[js_test:replsettest_control_1_node] 2019-11-25T04:45:58.425+0000 ReplSetTest initiate command took 279ms for 5 nodes.
|
This getPrimary call takes over 200ms, presumably because it fails on its first try, and then waits 200ms until it tries again. If we reduce this getPrimary retry interval to 25ms (patch build), we bring the initiate command time down to 103ms, a 963ms improvement over the baseline:
[js_test:replsettest_control_1_node] 2019-11-25T05:10:12.793+0000 ReplSetTest initiate getPrimary took 34ms for 5 nodes.
|
[js_test:replsettest_control_1_node] 2019-11-25T05:10:12.794+0000 ReplSetTest initiate command took 103ms for 5 nodes.
|
|
|
To investigate the slowest sections of ReplSetTest.initiate and how they could be improved, I started from the baseline metrics shown in the above comment. The execution of ReplSetTest.initiate can be divided into the time spent in initiateWithAnyNodeAsPrimary and the time spent inside the stepUp call inside initiateWithNodeZeroAsPrimary. Here are the slowest sections of initiateWithAnyNodeAsPrimary, extracted from the Evergreen baseline above:
2019-11-20T17:25:31.333+0000 ReplSetTest initiate command took 1066ms for 5 nodes.
|
2019-11-20T17:25:32.579+0000 ReplSetTest initiate awaitSecondaryNodes took 1237ms for 5 nodes.
|
2019-11-20T17:25:33.751+0000 ReplSetTest initiate awaitLastStableRecoveryTimestamp took 1149ms for 5 nodes.
|
and these are the slowest sections of stepUp
2019-11-20T17:25:35.642+0000 ReplSetTest stepUp awaitNodesAgreeOnAppliedOpTime took 1865ms for 5 nodes.
|
We look at each below to determine why they are slow.
|
|
Here are baseline performance metrics of ReplSetTest.initiate from an Evergreen RHEL 6.2 patch build:
2019-11-20T17:25:30.261+0000 ReplSetTest startSet took 826ms for 5 nodes.
|
2019-11-20T17:25:31.333+0000 ReplSetTest initiate command took 1066ms for 5 nodes.
|
2019-11-20T17:25:31.344+0000 ReplSetTest reconfiguration to add nodes took 9ms for 5 nodes.
|
2019-11-20T17:25:32.579+0000 ReplSetTest initiate awaitSecondaryNodes took 1237ms for 5 nodes.
|
2019-11-20T17:25:32.607+0000 ReplSetTest initiate waitForKeys took 5ms for 5 nodes.
|
2019-11-20T17:25:33.751+0000 ReplSetTest initiate awaitLastStableRecoveryTimestamp took 1149ms for 5 nodes.
|
2019-11-20T17:25:33.751+0000 ReplSetTest initiateWithAnyNodeAsPrimary took 3490ms for 5 nodes.
|
2019-11-20T17:25:33.776+0000 ReplSetTest stepUp awaitReplication took 25ms for 5 nodes.
|
2019-11-20T17:25:35.642+0000 ReplSetTest stepUp awaitNodesAgreeOnAppliedOpTime took 1865ms for 5 nodes.
|
2019-11-20T17:25:35.648+0000 ReplSetTest stepUp awaitNodesAgreeOnPrimary took 7ms for 5 nodes.
|
2019-11-20T17:25:35.650+0000 ReplSetTest initiate stepUp took 1899ms for 5 nodes.
|
2019-11-20T17:25:35.650+0000 ReplSetTest initiateWithNodeZeroAsPrimary took 5390ms for 5 nodes.
|
|
|
Some areas of the code to investigate that may slow down the ReplSetTest.initiate process:
|
Generated at Thu Feb 08 05:04:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.