-
Type: Bug
-
Resolution: Cannot Reproduce
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.4.0-rc6
-
Component/s: None
-
None
-
Storage - Ra 2020-05-18
This isn't 4.4.0-rc6 but it has diffs that were merged after rc5 and should be in rc6. I used a build with this diff and the binaries from here
The good news is that it might have fixed WT-6164 as I have yet to reproduce an OOM on shutdown. The bad news is that I get a new problem – an intermittent hang on shutdown.
From mongod.log, the start of shutdown
{"t":{"$date":"2020-05-14T18:04:07.921+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn187","msg":"connection ended","attr":{"remote":"10.2.0.200:53820","connectionCount":31}} {"t":{"$date":"2020-05-14T18:04:07.921+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn174","msg":"connection ended","attr":{"remote":"10.2.0.200:53792","connectionCount":30}} {"t":{"$date":"2020-05-14T18:04:07.921+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn171","msg":"connection ended","attr":{"remote":"10.2.0.200:53788","connectionCount":29}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn168","msg":"connection ended","attr":{"remote":"10.2.0.200:53782","connectionCount":27}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn167","msg":"connection ended","attr":{"remote":"10.2.0.200:53780","connectionCount":25}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn166","msg":"connection ended","attr":{"remote":"10.2.0.200:53778","connectionCount":24}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn165","msg":"connection ended","attr":{"remote":"10.2.0.200:53776","connectionCount":23}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn163","msg":"connection ended","attr":{"remote":"10.2.0.200:53772","connectionCount":21}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn161","msg":"connection ended","attr":{"remote":"10.2.0.200:53768","connectionCount":20}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn160","msg":"connection ended","attr":{"remote":"10.2.0.200:53766","connectionCount":19}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn158","msg":"connection ended","attr":{"remote":"10.2.0.200:53762","connectionCount":18}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn159","msg":"connection ended","attr":{"remote":"10.2.0.200:53764","connectionCount":17}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn157","msg":"connection ended","attr":{"remote":"10.2.0.200:53760","connectionCount":16}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn172","msg":"connection ended","attr":{"remote":"10.2.0.200:53790","connectionCount":15}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn162","msg":"connection ended","attr":{"remote":"10.2.0.200:53770","connectionCount":14}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn170","msg":"connection ended","attr":{"remote":"10.2.0.200:53786","connectionCount":28}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn169","msg":"connection ended","attr":{"remote":"10.2.0.200:53784","connectionCount":26}} {"t":{"$date":"2020-05-14T18:04:07.922+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn164","msg":"connection ended","attr":{"remote":"10.2.0.200:53774","connectionCount":22}} {"t":{"$date":"2020-05-14T18:04:17.955+00:00"},"s":"I", "c":"CONTROL", "id":23377, "ctx":"SignalHandler","msg":"got signal {sig} ({strsignal_sig})","attr":{"sig":15,"strsignal_sig":"Terminated"}} {"t":{"$date":"2020-05-14T18:04:17.955+00:00"},"s":"I", "c":"CONTROL", "id":23378, "ctx":"SignalHandler","msg":"kill from pid:{si_si_pid} uid:{si_si_uid}","attr":{"si_si_pid":103572,"si_si_uid":1000}} {"t":{"$date":"2020-05-14T18:04:17.955+00:00"},"s":"I", "c":"CONTROL", "id":23381, "ctx":"SignalHandler","msg":"will terminate after current cmd ends"} {"t":{"$date":"2020-05-14T18:04:17.955+00:00"},"s":"I", "c":"REPL", "id":40441, "ctx":"SignalHandler","msg":"Stopping TopologyVersionObserver"} {"t":{"$date":"2020-05-14T18:04:17.955+00:00"},"s":"I", "c":"REPL", "id":40447, "ctx":"TopologyVersionObserver","msg":"Stopped TopologyVersionObserver"} {"t":{"$date":"2020-05-14T18:04:17.956+00:00"},"s":"I", "c":"NETWORK", "id":20562, "ctx":"SignalHandler","msg":"Shutdown: going to close listening sockets"} {"t":{"$date":"2020-05-14T18:04:17.957+00:00"},"s":"I", "c":"NETWORK", "id":23017, "ctx":"listener","msg":"removing socket file: {path}","attr":{"path":"/tmp/mongodb-27017.sock"}} {"t":{"$date":"2020-05-14T18:04:17.957+00:00"},"s":"I", "c":"-", "id":20520, "ctx":"SignalHandler","msg":"Stopping further Flow Control ticket acquisitions."} {"t":{"$date":"2020-05-14T18:04:17.957+00:00"},"s":"I", "c":"ASIO", "id":22582, "ctx":"ReplNodeDbWorkerNetwork","msg":"Killing all outstanding egress activity."} {"t":{"$date":"2020-05-14T18:04:17.958+00:00"},"s":"I", "c":"REPL", "id":21328, "ctx":"SignalHandler","msg":"Shutting down replication subsystems"} {"t":{"$date":"2020-05-14T18:04:17.958+00:00"},"s":"I", "c":"REPL", "id":21302, "ctx":"SignalHandler","msg":"Stopping replication reporter thread"} {"t":{"$date":"2020-05-14T18:04:17.958+00:00"},"s":"I", "c":"REPL", "id":21303, "ctx":"SignalHandler","msg":"Stopping replication fetcher thread"} {"t":{"$date":"2020-05-14T18:04:17.958+00:00"},"s":"I", "c":"REPL", "id":21304, "ctx":"SignalHandler","msg":"Stopping replication applier thread"} {"t":{"$date":"2020-05-14T18:04:17.958+00:00"},"s":"I", "c":"REPL", "id":21107, "ctx":"BackgroundSync","msg":"Stopping replication producer"} {"t":{"$date":"2020-05-14T18:04:18.759+00:00"},"s":"I", "c":"REPL", "id":21225, "ctx":"OplogApplier-0","msg":"Finished oplog application"} {"t":{"$date":"2020-05-14T18:04:18.759+00:00"},"s":"I", "c":"REPL", "id":21307, "ctx":"SignalHandler","msg":"Stopping replication storage threads"} {"t":{"$date":"2020-05-14T18:04:18.759+00:00"},"s":"I", "c":"ASIO", "id":22582, "ctx":"OplogApplierNetwork","msg":"Killing all outstanding egress activity."} {"t":{"$date":"2020-05-14T18:04:18.759+00:00"},"s":"I", "c":"ASIO", "id":22582, "ctx":"ReplCoordExternNetwork","msg":"Killing all outstanding egress activity."} {"t":{"$date":"2020-05-14T18:04:18.760+00:00"},"s":"I", "c":"ASIO", "id":22582, "ctx":"ReplNetwork","msg":"Killing all outstanding egress activity."} {"t":{"$date":"2020-05-14T18:04:18.760+00:00"},"s":"I", "c":"-", "id":4695300, "ctx":"SignalHandler","msg":"Interrupted all currently running operations","attr":{"opsKilled":19}} {"t":{"$date":"2020-05-14T18:04:19.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}} {"t":{"$date":"2020-05-14T18:04:20.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
Then lots of this
{"t":{"$date":"2020-05-14T18:05:26.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}} {"t":{"$date":"2020-05-14T18:05:27.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}} {"t":{"$date":"2020-05-14T18:05:28.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}} {"t":{"$date":"2020-05-14T18:05:29.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}} {"t":{"$date":"2020-05-14T18:05:30.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
Then this, once
{"t":{"$date":"2020-05-14T18:05:31.000+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"WTCheckpointThread","msg":"WiredTiger message","attr":{"message":"evict generation drain waited 1 minutes"}}
Then this forever
{"t":{"$date":"2020-05-14T18:05:31.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}} {"t":{"$date":"2020-05-14T18:05:32.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}} {"t":{"$date":"2020-05-14T18:05:33.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}} {"t":{"$date":"2020-05-14T18:05:34.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}} {"t":{"$date":"2020-05-14T18:05:35.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}} {"t":{"$date":"2020-05-14T18:05:36.444+00:00"},"s":"I", "c":"STORAGE", "id":22263, "ctx":"TimestampMonitor","msg":"Timestamp monitor is stopping. {reason}","attr":{"Timestamp monitor is stoppingreason":"interrupted at shutdown"}}
And these occur, but less frequently
{"t":{"$date":"2020-05-14T18:15:31.000+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"WTCheckpointThread","msg":"WiredTiger message","attr":{"message":"evict generation drain waited 11 minutes"}}