Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55606

Majority writes with j: false perform worse than j: true

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.2.0, 4.4.2
    • Component/s: None
    • Labels:
    • Replication
    • ALL
    • Repl 2021-05-31, Repl 2021-06-14, Repl 2021-06-28, Repl 2021-07-12, Repl 2021-07-26, Repl 2021-08-09

      Parallel insert performance test shows that writes with { w: majority, j: false } perform worse than { w: majority } which enabled journaling by default.

              {
                "end": 1606114697.135247,
                "name": "ParallelInsert-1-Insert_WMajority_JFalse",
                "results": {
                  "1": {
                    "ops_per_sec": 315.3323923270361,
                    "ops_per_sec_values": [
                      315.3323923270361
                    ]
                  }
                },
                "start": 1606112991.5566537,
                "workload": "genny"
              },
      ...
              {
                "end": 1606114697.135247,
                "name": "ParallelInsert-1-Insert_WMajority",
                "results": {
                  "1": {
                    "ops_per_sec": 373.07617520622085,
                    "ops_per_sec_values": [
                      373.07617520622085
                    ]
                  }
                },
                "start": 1606112991.5566537,
                "workload": "genny"
              },
      

      I found this issue using an update workload in my paper experiments against 4.4.2 in a 5-node setting. It's weird that the latency is about 100ms for some operations even when there's just one client. This seems related to the default  storage.journal.commitIntervalMs used by the journal flusher

        throughput latency_50_in_ms latency_90_in_ms latency_99_in_ms thread_num iteration
      0 33.166667 60.167260 99.918017 100.078770 2 0
      5 266.000000 60.109203 99.889717 100.047515 16 0
      3 1010.766667 63.347630 101.193805 102.650609 64 0
      2 3390.566667 75.609942 106.122061 110.796925 256 0
      4 8377.533333 122.405804 130.348062 140.317615 1024 0
      1 13738.645161 144.346179 179.020890 230.821927 2048 0

      After I changed writeConcernMajorityJournalDefault to false, { w: majority } starts to perform as expected.

        throughput latency_50_in_ms latency_90_in_ms latency_99_in_ms thread_num iteration
      0 1231.566667 1.586717 2.106353 2.762118 2 0
      3 11776.800000 5.379346 6.699356 11.847912 64 0
      4 14283.500000 8.906318 10.432745 13.263393 128 0
      1 14588.533333 17.495040 22.730308 29.815806 256 0
      2 15812.966667 32.327074 36.412116 52.309199 512 0

      As a comparison, the default setting with { writeConcernMajorityJournalDefault: true } and { w: majority } gives a lower but comparable throughput as expected.

        throughput latency_50_in_ms latency_90_in_ms latency_99_in_ms thread_num iteration
      0 1166.633333 1.673785 2.223570 2.948424 2 0
      5 4957.833333 3.171073 4.634047 7.260699 16 0
      3 9701.066667 6.537874 7.292936 49.282507 64 0
      2 12332.733333 20.704192 25.998145 64.597367 256 0
      4 13330.633333 76.797501 115.536679 176.026491 1024 0
      1 15191.266667 135.013036 169.713326 221.682833 2048 0

      My theory is that the code missed signaling writers when updating the positions because it is conditional on writeConcernMajorityJournalDefault.

        1. commandInputs.png
          60 kB
          Daniel Gottlieb
        2. wcDefault.png
          31 kB
          Daniel Gottlieb

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            siyuan.zhou@mongodb.com Siyuan Zhou
            Votes:
            2 Vote for this issue
            Watchers:
            39 Start watching this issue

              Created:
              Updated: