[SERVER-58819] Release of files after close_idle_time should not block the system Created: 26/Jul/21 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | John Moser | Assignee: | Backlog - Storage Engines Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | refinement | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Storage Engines
|
| Participants: | |
| Story Points: | 8 |
| Description |
|
Setup : 200'000 collection and index files, HDD After close_idle_time it seems all the open files are closed at once so that the liveness and readiness probe fail. Request : could the closing done in a more "soft" way, so that other processes on the OS, who need to access the disk are not blocked ?
|
| Comments |
| Comment by Dmitry Agranat [ 29/Jul/21 ] | |
|
Thank you jamoser42@gmail.com for opening a new feature request. We're assigning this ticket to the appropriate team to be evaluated against our currently planned work. Updates will be posted on this ticket as they happen. | |
| Comment by John Moser [ 26/Jul/21 ] | |
|
This thicket is closely related to https://jira.mongodb.org/browse/SERVER-58818 So at startup all the files are getting read and then after some idle time 200k - 250 files are more or less getting released at once. This causes a huge stress on the systems IO so that for example the liveness and readiness probe (could) fail. Worst case when using kubernetes the mongod pods get restarted. Imho I dont think manual adjustments like above in Production environment would acceptable. If there is just a sleep(10ms) or so between each close, this could help to make the system more responsive during such an event. Or even better, not to read all the files at startup ... | |
| Comment by Dmitry Agranat [ 26/Jul/21 ] | |
|
If all (or most of) the 200k tables are being closed, it means that your application has a large number of tables (a table is either a collection or an index) that only regularly access a relatively small proportion of them. Instead of waiting for the default 100000 seconds while accumulating hundreds of thousands of tables, you can try closing them more gradually with the following parameter:
Please note that the default is file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250) Regards, |