Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-44066

Investigate and optimize the slowest portions of ReplSetTest.start when not waiting for a connection

    • Type: Icon: Task Task
    • Resolution: Won't Fix
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • Replication
    • Repl 2019-11-04

      After parallelizing the startup of nodes in ReplSetTest.startSet, early tests show that even when we don't wait for a connection to a mongod node, ReplSetTest.start can take several hundred milliseconds. As measured on one Linux workstation it often takes more than 300 milliseconds. When starting up several nodes, this sequential performance bottleneck can hurt the scalability of ReplSetTest.startSet since we need to wait several hundred milliseconds before starting up the next mongod node. This impacts the performance goal of the Faster Local Testing project, which is to make startSet performance for N nodes be no greater than 1.5x the performance for 1 node. We can see the performance impact of ReplSetTest.start speed in this simple benchmark for a ReplSetTest that does nothing but repeatedly start up and shut down replica sets of increasing sizes:

      [js_test:replsettest_control_scale] 2019-10-17T12:35:37.221-0400 ReplSetTest start took 304ms for node 0
      [js_test:replsettest_control_scale] 2019-10-17T12:35:37.827-0400 ReplSetTest startSet took 919ms for 1 nodes.
      [js_test:replsettest_control_scale] 2019-10-17T12:35:39.625-0400 ReplSetTest start took 263ms for node 0
      [js_test:replsettest_control_scale] 2019-10-17T12:35:39.938-0400 ReplSetTest start took 308ms for node 1
      [js_test:replsettest_control_scale] 2019-10-17T12:35:40.543-0400 ReplSetTest startSet took 1189ms for 2 nodes.
      [js_test:replsettest_control_scale] 2019-10-17T12:35:41.646-0400 ReplSetTest start took 266ms for node 0
      [js_test:replsettest_control_scale] 2019-10-17T12:35:41.966-0400 ReplSetTest start took 314ms for node 1
      [js_test:replsettest_control_scale] 2019-10-17T12:35:42.332-0400 ReplSetTest start took 365ms for node 2
      [js_test:replsettest_control_scale] 2019-10-17T12:35:42.939-0400 ReplSetTest startSet took 1568ms for 3 nodes.
      [js_test:replsettest_control_scale] 2019-10-17T12:35:44.557-0400 ReplSetTest start took 263ms for node 0
      [js_test:replsettest_control_scale] 2019-10-17T12:35:44.870-0400 ReplSetTest start took 309ms for node 1
      [js_test:replsettest_control_scale] 2019-10-17T12:35:45.226-0400 ReplSetTest start took 355ms for node 2
      [js_test:replsettest_control_scale] 2019-10-17T12:35:45.608-0400 ReplSetTest start took 373ms for node 3
      [js_test:replsettest_control_scale] 2019-10-17T12:35:46.005-0400 ReplSetTest startSet took 1720ms for 4 nodes.
      [js_test:replsettest_control_scale] 2019-10-17T12:35:47.769-0400 ReplSetTest start took 240ms for node 0
      [js_test:replsettest_control_scale] 2019-10-17T12:35:48.060-0400 ReplSetTest start took 285ms for node 1
      [js_test:replsettest_control_scale] 2019-10-17T12:35:48.388-0400 ReplSetTest start took 328ms for node 2
      [js_test:replsettest_control_scale] 2019-10-17T12:35:48.765-0400 ReplSetTest start took 372ms for node 3
      [js_test:replsettest_control_scale] 2019-10-17T12:35:49.188-0400 ReplSetTest start took 415ms for node 4
      [js_test:replsettest_control_scale] 2019-10-17T12:35:49.586-0400 ReplSetTest startSet took 2064ms for 5 nodes.
      

      Even with only 5 nodes, startSet speed (2064ms) is already 2.2x the performance of 1 node (919ms), which is worse than the performance goal of 1.5x. We should investigate the slowest parts of ReplSetTest.start and see what portions can be optimized.

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            william.schultz@mongodb.com Will Schultz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: