Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53036

[SBE] Investigate why several sys-perf benchmarks fail

    • Query Execution
    • Query Execution 2021-05-31

      In December 2020 / January 2021, I made a diff that made SBE mode enabled by default and then I tried running all of the sys-perf benchmarks. Diff:

      diff --git a/src/mongo/db/query/query_knobs.idl b/src/mongo/db/query/query_knobs.idl
      index 52651caa6f..f6653beb03 100644
      --- a/src/mongo/db/query/query_knobs.idl
      +++ b/src/mongo/db/query/query_knobs.idl
      @@ -410,7 +410,7 @@ server_parameters:
           set_at: [ startup, runtime ]
           cpp_varname: "internalQueryEnableSlotBasedExecutionEngine"
           cpp_vartype: AtomicWord<bool>
      - default: false
      + default: true
       
         internalQueryDefaultDOP:
           description: "Default degree of parallelism. This an internal experimental parameter and should not be changed on live systems."
      

      sys-perf Evergreen run: https://spruce.mongodb.com/version/5fb8666657e85a0819d36b59/tasks

      Under SBE mode, for the "industry_benchmarks", "industry_benchmarks_wmajority", "linkbench", and "ycsb_60GB" tasks, the loading phase succeeds ("ycsb_load" or "linkbench_load"), but then the subsequent phase fails ("ycsb_100read", "ycsb_95read5update_w_majority", or "linkbench_request").

      In the main Evergreen log for each of these tasks, I see the following error:

      [2020/11/21 02:13:34.496] 02:13:34Z>  about to fork child process, waiting until server is ready for connections.
      [2020/11/21 02:13:35.721] 02:13:34Z>  forked process: 35542
      [2020/11/21 02:13:35.721] 02:13:35Z>  ERROR: child process failed, exited with 4

      When I looked at the "mongod.log" file for the phase that failed ("ycsb_100read", "ycsb_95read5update_w_majority", or "linkbench_request"), it appears that mongod was encountering an error during startup, and then the mongod would give up and shutdown. Here is a snippet from the "mongod.log" file:

      {"t":{"$date":"2020-11-21T02:51:19.012+00:00"},"s":"E", "c":"CONTROL", "id":20539, "ctx":"initandlisten","msg":"Failed to verify auth schema version","attr":{"minSchemaVersion":3,"error":{"code":13436,"codeName":"NotPrimaryOrSecondary","errmsg":"not master or secondary; cannot currently read from this replSet member"}}}
      {"t":{"$date":"2020-11-21T02:51:19.012+00:00"},"s":"I", "c":"CONTROL", "id":20540, "ctx":"initandlisten","msg":"To manually repair the 'authSchema' document in the admin.system.version collection, start up with --setParameter startupAuthSchemaValidation=false to disable validation"}
      {"t":{"$date":"2020-11-21T02:51:19.012+00:00"},"s":"I", "c":"REPL", "id":4784900, "ctx":"initandlisten","msg":"Stepping down the ReplicationCoordinator for shutdown","attr":{"waitTimeMillis":15000}}
      ..
      {"t":{"$date":"2020-11-21T02:51:19.140+00:00"},"s":"I", "c":"CONTROL", "id":23138, "ctx":"initandlisten","msg":"Shutting down","attr":{"exitCode":4}}

      The goal of this task is to investigate and understand precisely why mongod is hitting the "not master or secondary; cannot currently read from this replSet member" error during startup for these benchmarks, to develop a repro that can be done on a developer's local machine, and to open a new task (or update this task) with these findings.

      Below are links to the Evergreen runs for each of the failing benchmarks:

      industry_benchmarks: https://spruce.mongodb.com/task/sys_perf_linux_1_node_replSet_industry_benchmarks_patch_73d1a6f368b04161dce7c0afbcea23efb52e2070_5fb8666657e85a0819d36b59_20_11_21_01_00_30/

      industry_benchmarks_wmajority: https://spruce.mongodb.com/task/sys_perf_linux_1_node_replSet_industry_benchmarks_wmajority_patch_73d1a6f368b04161dce7c0afbcea23efb52e2070_5fb8666657e85a0819d36b59_20_11_21_01_00_30/

      linkbench: https://spruce.mongodb.com/task/sys_perf_linux_1_node_replSet_linkbench_patch_73d1a6f368b04161dce7c0afbcea23efb52e2070_5fb8666657e85a0819d36b59_20_11_21_01_00_30/

      ycsb_60GB: https://spruce.mongodb.com/task/sys_perf_linux_1_node_replSet_ycsb_60GB_patch_73d1a6f368b04161dce7c0afbcea23efb52e2070_5fb8666657e85a0819d36b59_20_11_21_01_00_30/

       

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            andrew.paroski@mongodb.com Drew Paroski
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: