[SERVER-55451] Create (or reuse) the named FIFO proxy to simulate mongod process entering uninterruptible sleep state Created: 23/Mar/21  Updated: 06/Dec/22  Resolved: 24/Mar/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Andrew Shuvalov (Inactive) Assignee: Backlog - Storage Engines Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Storage Engines
Participants:

 Description   

Background

In HELP ticket I suspect the scenario when a faulty disk made all disk I/O to be blocked for indefinite time, which caused the process to enter the uninterruptible sleep state. The main culprit of this state is that when SIGKILL is issued the process is not killed because it's blocked on a syscall. The user killed the primary mongod server with -9 but it was not killed. After 13 minutes after SIGKILL, the user had to shut down the Amazon EC2 instance to break down hung sessions from multiple mongos proxies to the faulty primary.

More background on why `kill -9` will never kill the process in the uninterruptible sleep state:
https://askubuntu.com/questions/59811/kill-pid-not-really-killing-the-process-why

Various tricks people use to simulate the uninterruptible sleep state:
https://unix.stackexchange.com/questions/134888/simulate-an-unkillable-process-in-d-state

More background on why kernel prevents killing process in this kind of state:
https://stackoverflow.com/questions/223644/what-is-an-uninterruptible-process

and LWN article: https://lwn.net/Articles/288056/

Proposal

The trick we can use is the idea that a named FIFO pipe is also a blocked syscall that will make the process uninterruptible if there is no data for some time. Two changes are needed:

1. We need to either write or reuse a proxy server between FIFO and hard disk. It will present itself as multiple named pipes, and redirect each pipe to a file on hard disk.

2. Modify mongod / WT to detect a special file format and ping the FIFO proxy with special command instead of creating a new file itself. The proxy should listen for those requests and create a new FIFO when asked. Then mongod can open the FIFO like a regular file, the rest of the code is unchanged.

To simulate the outage, the FIFO proxy should be instructed to stop replying to R/W requests.

Not the same as network proxy

Please note that we already have mongobridge to simulate network errors, however this is not the same. The mongo bridge cannot make the outage in the mongod, it can only make the client to think that mongod has an outage, which is very different from the scenario in HELP ticket.



 Comments   
Comment by Andrew Shuvalov (Inactive) [ 24/Mar/21 ]

Preempted by a better idea in SERVER-55486

Comment by Andrew Shuvalov (Inactive) [ 23/Mar/21 ]

aaron.redalen suggested to use dm-delay https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/delay.html as an alternative to my approach. Please give it a second thought as much simpler approach to simulate disk failure, and if you think it will work compatible with the way we use storage.

Comment by Bruce Lucas (Inactive) [ 23/Mar/21 ]

andrew.shuvalov if the goal is just to make mongod uninterruptible by kill -9 and if some thread waiting on a fifo with no data is sufficient to do that, why does that need to be in the storage engine?

Generated at Thu Feb 08 05:36:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.