Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-7417

scan_with_long_lived perf job needs a longer no-output timeout

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • Backlog
    • Affects Version/s: None
    • Component/s: None
    • None

      We are hitting the default 90 mins timer for no output. The test is configured to run for 2 hours after the load phase and doesn't output anything for that long. Here is the local run that I did, with the timing call:

      [info ] [genny.curator       ] Moved existing metrics (presumably from a prior run). cwd=/mnt/data0/sulabh/work/genny existing=build/CedarMetrics moved_to=build/CedarMetrics-2021-04-19T030342Z-0f94120c timestamp=2021-04-19T03:03:42Z
      [info ] [genny.curator       ] Starting poplar grpc in the background. command=['/mnt/data0/sulabh/work/genny/build/curator/curator', 'poplar', 'grpc'] cwd=/mnt/data0/sulabh/work/genny timestamp=2021-04-19T03:03:42Z
      [curator] 2021/04/19 13:03:42 [p=info]: starting poplar gRPC service at 'localhost:2288'
      [2021-04-19 13:03:43.177893] [0x00007f8ea4988e80] [info]    Constructing pool with MongoURI 'mongodb://localhost:27017/?appName=Genny&maxPoolSize=5000'
      [2021-04-19 13:06:56.739428] [0x00007f8bb8c10700] [info]    Done with load phase. All documents loaded
      [2021-04-19 13:06:58.860785] [0x00007f8bb640b700] [info]    Done with load phase. All documents loaded
      [2021-04-19 13:07:00.019287] [0x00007f8bb840f700] [info]    Done with load phase. All documents loaded
      [2021-04-19 13:07:00.082431] [0x00007f8bb740d700] [info]    Done with load phase. All documents loaded
      [2021-04-19 13:07:00.086791] [0x00007f8bb5c0a700] [info]    Done with load phase. All documents loaded
      [2021-04-19 13:07:01.033338] [0x00007f8bb6c0c700] [info]    Done with load phase. All documents loaded
      [2021-04-19 13:07:01.034543] [0x00007f8bba413700] [info]    Done with load phase. All documents loaded
      [2021-04-19 13:07:01.035044] [0x00007f8bb9411700] [info]    Done with load phase. All documents loaded
      [2021-04-19 13:07:01.558331] [0x00007f8bb7c0e700] [info]    Done with load phase. All documents loaded
      [2021-04-19 13:07:07.537337] [0x00007f8bb9c12700] [info]    Done with load phase. All documents loaded
      [curator] 2021/04/19 15:30:59 [p=info]: poplar rpc service terminated
       
      real    147m17.612s
      user    384m15.664s
      sys     134m51.575s
      

      The load phase ran from 13:03:42 to 13:07:07, ie 3.5 mins (205 secs). Then for roughly 147 - 3.5 = 143.5 mins there was no output as the test ran. The run time seems correct to me, for a configured 2 hours, plus some more time to finish the last scanner.

      Either we should force the test to be verbose and output some messages during the test run, or change the no-message timer to say 3 hours to be safe.

      cc. alex.cameron

            Assignee:
            jim.oleary@mongodb.com James O'Leary
            Reporter:
            sulabh.mahajan@mongodb.com Sulabh Mahajan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: