Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6216

shutdown hangs with latest WT import

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.0-rc6
    • Component/s: None
    • Labels:
      None
    • Storage - Ra 2020-05-18

      This isn't 4.4.0-rc6 but it has diffs that were merged after rc5 and should be in rc6. I used a build with this diff and the binaries from here

      The good news is that it might have fixed WT-6164 as I have yet to reproduce an OOM on shutdown. The bad news is that I get a new problem – an intermittent hang on shutdown.

      From mongod.log, the start of shutdown

      {"t":{"$date":"2020-05-14T18:04:07.921+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn187","msg":"connection ended","attr":{"remote":"10.2.0.200:53820","connectionCount":31}}
      {"t":{"$date":"2020-05-14T18:04:07.921+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn174","msg":"connection ended","attr":{"remote":"10.2.0.200:53792","connectionCount":30}}
      {"t":{"$date":"2020-05-14T18:04:07.921+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn171","msg":"connection ended","attr":{"remote":"10.2.0.200:53788","connectionCount":29}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn168","msg":"connection ended","attr":{"remote":"10.2.0.200:53782","connectionCount":27}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn167","msg":"connection ended","attr":{"remote":"10.2.0.200:53780","connectionCount":25}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn166","msg":"connection ended","attr":{"remote":"10.2.0.200:53778","connectionCount":24}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn165","msg":"connection ended","attr":{"remote":"10.2.0.200:53776","connectionCount":23}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn163","msg":"connection ended","attr":{"remote":"10.2.0.200:53772","connectionCount":21}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn161","msg":"connection ended","attr":{"remote":"10.2.0.200:53768","connectionCount":20}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn160","msg":"connection ended","attr":{"remote":"10.2.0.200:53766","connectionCount":19}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn158","msg":"connection ended","attr":{"remote":"10.2.0.200:53762","connectionCount":18}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn159","msg":"connection ended","attr":{"remote":"10.2.0.200:53764","connectionCount":17}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn157","msg":"connection ended","attr":{"remote":"10.2.0.200:53760","connectionCount":16}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn172","msg":"connection ended","attr":{"remote":"10.2.0.200:53790","connectionCount":15}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn162","msg":"connection ended","attr":{"remote":"10.2.0.200:53770","connectionCount":14}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn170","msg":"connection ended","attr":{"remote":"10.2.0.200:53786","connectionCount":28}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn169","msg":"connection ended","attr":{"remote":"10.2.0.200:53784","connectionCount":26}}
      {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn164","msg":"connection ended","attr":{"remote":"10.2.0.200:53774","connectionCount":22}}
      {"t":{"$date":"2020-05-14T18:04:17.955+00:00"},"s":"I",  "c":"CONTROL",  "id":23377,   "ctx":"SignalHandler","msg":"got signal {sig} ({strsignal_sig})","attr":{"sig":15,"strsignal_sig":"Terminated"}}
      {"t":{"$date":"2020-05-14T18:04:17.955+00:00"},"s":"I",  "c":"CONTROL",  "id":23378,   "ctx":"SignalHandler","msg":"kill from pid:{si_si_pid} uid:{si_si_uid}","attr":{"si_si_pid":103572,"si_si_uid":1000}}
      {"t":{"$date":"2020-05-14T18:04:17.955+00:00"},"s":"I",  "c":"CONTROL",  "id":23381,   "ctx":"SignalHandler","msg":"will terminate after current cmd ends"}
      {"t":{"$date":"2020-05-14T18:04:17.955+00:00"},"s":"I",  "c":"REPL",     "id":40441,   "ctx":"SignalHandler","msg":"Stopping TopologyVersionObserver"}
      {"t":{"$date":"2020-05-14T18:04:17.955+00:00"},"s":"I",  "c":"REPL",     "id":40447,   "ctx":"TopologyVersionObserver","msg":"Stopped TopologyVersionObserver"}
      {"t":{"$date":"2020-05-14T18:04:17.956+00:00"},"s":"I",  "c":"NETWORK",  "id":20562,   "ctx":"SignalHandler","msg":"Shutdown: going to close listening sockets"}
      {"t":{"$date":"2020-05-14T18:04:17.957+00:00"},"s":"I",  "c":"NETWORK",  "id":23017,   "ctx":"listener","msg":"removing socket file: {path}","attr":{"path":"/tmp/mongodb-27017.sock"}}
      {"t":{"$date":"2020-05-14T18:04:17.957+00:00"},"s":"I",  "c":"-",        "id":20520,   "ctx":"SignalHandler","msg":"Stopping further Flow Control ticket acquisitions."}
      {"t":{"$date":"2020-05-14T18:04:17.957+00:00"},"s":"I",  "c":"ASIO",     "id":22582,   "ctx":"ReplNodeDbWorkerNetwork","msg":"Killing all outstanding egress activity."}
      {"t":{"$date":"2020-05-14T18:04:17.958+00:00"},"s":"I",  "c":"REPL",     "id":21328,   "ctx":"SignalHandler","msg":"Shutting down replication subsystems"}
      {"t":{"$date":"2020-05-14T18:04:17.958+00:00"},"s":"I",  "c":"REPL",     "id":21302,   "ctx":"SignalHandler","msg":"Stopping replication reporter thread"}
      {"t":{"$date":"2020-05-14T18:04:17.958+00:00"},"s":"I",  "c":"REPL",     "id":21303,   "ctx":"SignalHandler","msg":"Stopping replication fetcher thread"}
      {"t":{"$date":"2020-05-14T18:04:17.958+00:00"},"s":"I",  "c":"REPL",     "id":21304,   "ctx":"SignalHandler","msg":"Stopping replication applier thread"}
      {"t":{"$date":"2020-05-14T18:04:17.958+00:00"},"s":"I",  "c":"REPL",     "id":21107,   "ctx":"BackgroundSync","msg":"Stopping replication producer"}
      {"t":{"$date":"2020-05-14T18:04:18.759+00:00"},"s":"I",  "c":"REPL",     "id":21225,   "ctx":"OplogApplier-0","msg":"Finished oplog application"}
      {"t":{"$date":"2020-05-14T18:04:18.759+00:00"},"s":"I",  "c":"REPL",     "id":21307,   "ctx":"SignalHandler","msg":"Stopping replication storage threads"}
      {"t":{"$date":"2020-05-14T18:04:18.759+00:00"},"s":"I",  "c":"ASIO",     "id":22582,   "ctx":"OplogApplierNetwork","msg":"Killing all outstanding egress activity."}
      {"t":{"$date":"2020-05-14T18:04:18.759+00:00"},"s":"I",  "c":"ASIO",     "id":22582,   "ctx":"ReplCoordExternNetwork","msg":"Killing all outstanding egress activity."}
      {"t":{"$date":"2020-05-14T18:04:18.760+00:00"},"s":"I",  "c":"ASIO",     "id":22582,   "ctx":"ReplNetwork","msg":"Killing all outstanding egress activity."}
      {"t":{"$date":"2020-05-14T18:04:18.760+00:00"},"s":"I",  "c":"-",        "id":4695300, "ctx":"SignalHandler","msg":"Interrupted all currently running operations","attr":{"opsKilled":19}}
      {"t":{"$date":"2020-05-14T18:04:19.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      {"t":{"$date":"2020-05-14T18:04:20.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      

      Then lots of this

      {"t":{"$date":"2020-05-14T18:05:26.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      {"t":{"$date":"2020-05-14T18:05:27.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      {"t":{"$date":"2020-05-14T18:05:28.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      {"t":{"$date":"2020-05-14T18:05:29.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      {"t":{"$date":"2020-05-14T18:05:30.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      

      Then this, once

      {"t":{"$date":"2020-05-14T18:05:31.000+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"WTCheckpointThread","msg":"WiredTiger message","attr":{"message":"evict generation drain waited 1 minutes"}}
      

      Then this forever

      {"t":{"$date":"2020-05-14T18:05:31.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      {"t":{"$date":"2020-05-14T18:05:32.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      {"t":{"$date":"2020-05-14T18:05:33.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      {"t":{"$date":"2020-05-14T18:05:34.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      {"t":{"$date":"2020-05-14T18:05:35.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      {"t":{"$date":"2020-05-14T18:05:36.444+00:00"},"s":"I",  "c":"STORAGE",  "id":22263,   "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
      

      And these occur, but less frequently

      {"t":{"$date":"2020-05-14T18:15:31.000+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"WTCheckpointThread","msg":"WiredTiger message","attr":{"message":"evict generation drain waited 11 minutes"}}
      

            Assignee:
            sulabh.mahajan@mongodb.com Sulabh Mahajan
            Reporter:
            mark.callaghan@mongodb.com Mark Callaghan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: