[SERVER-53014] Config server runs out of open files when dropping collection and inserting in a loop Created: 21/Nov/20 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.0, 4.2.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Oleg Pudeyev (Inactive) | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Assigned Teams: |
Storage Execution
|
| Operating System: | ALL |
| Participants: |
| Description |
|
I made the following test program which repeatedly drops a collection and inserts a document into it:
I ran this against 4.2 and 4.4 sharded clusters launched as follows:
In both cases, after some time, the config server runs out of open files. Program output:
Server logs attached for 4.2.10 and 4.4.0. Since I am operating on the same collection, I expect the server to be able to handle this workload without unbounded growth in open file count. This condition, once encountered, appears to be persistent. If I now restart the crashed config server, I get one of two behaviors:
The second attached config server log file for 4.4 is with 3 attempts - one original as shown in the first file + two restarts. |
| Comments |
| Comment by Haley Connelly [ 07/Dec/20 ] | |
|
This sounds like something the execution could consider to limit outstanding resources on the system. I'm reassigning this ticket for them to comment on. | |
| Comment by Oleg Pudeyev (Inactive) [ 04/Dec/20 ] | |
|
I expected the server to limit how many background drops it allows to be outstanding, such that overall resource consumption is bounded. | |
| Comment by Haley Connelly [ 04/Dec/20 ] | |
|
oleg.pudeyev, I believe this is works as expected. In the ruby driver, if the database isn’t specified, it defaults to the admin database. Since the admin database lives on the config server, and file descriptors are cleaned up lazily after the collection is dropped (here is more background on two phase drop), it’s not unexpected that the config server runs out of open files when it creates and drops the admin.db collection in a tight loop. In the logs, it can be confirmed that the collection implicitly created and then dropped in a loop is on the admin database.
In jsTests, we have waitForAllCollectionDropsToComplete(conn) to account for this. |