[SERVER-17230] Replica set Primary should step down if Out of file descriptors Created: 09/Feb/15  Updated: 06/Dec/22

Status: In Progress
Project: Core Server
Component/s: Replication
Affects Version/s: 2.6.7
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: SRR Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-9552 when replica set member has full disk... Backlog
Assigned Teams:
Replication
Participants:

 Description   

A replica set Primary should step down and let a secondary take over instead of locking up the database when this happens:

2015-02-08T22:07:14.675-0500 [journal] In File::open(), ::open for '/mongodb/mongod/data/journal/lsn' failed with errno:24 Too many open files
2015-02-08T22:07:14.675-0500 [journal] warning: open of lsn file failed
2015-02-08T22:07:15.592-0500 [conn171] Assertion: 13538:couldn't open [/proc/7329/stat] errno:24 Too many open files
2015-02-08T22:07:15.631-0500 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2015-02-08T22:07:15.631-0500 [initandlisten] ERROR: Out of file descriptors. Waiting one second before trying to accept more connections.
2015-02-08T22:07:16.631-0500 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2015-02-08T22:07:16.631-0500 [initandlisten] ERROR: Out of file descriptors. Waiting one second before trying to accept more connections.
2015-02-08T22:07:17.631-0500 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2015-02-08T22:07:17.632-0500 [initandlisten] ERROR: Out of file descriptors. Waiting one second before trying to accept more connections.
2015-02-08T22:07:18.603-0500 [conn183] Assertion: 13538:couldn't open [/proc/7329/stat] errno:24 Too many open files
2015-02-08T22:07:18.632-0500 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2015-02-08T22:07:18.632-0500 [initandlisten] ERROR: Out of file descriptors. Waiting one second before trying to accept more connections.
2015-02-08T22:07:19.632-0500 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2015-02-08T22:07:19.632-0500 [initandlisten] ERROR: Out of file descriptors. Waiting one second before trying to accept more connections.
2015-02-08T22:07:20.632-0500 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files



 Comments   
Comment by Eric Milkie [ 09/Feb/15 ]

This will have the same solution as SERVER-9552, and will probably be to crash the server when resource exhaustion occurs (due to the difficulty of detecting when it's safe to resume normal operations once the resource exhaustion is fixed).

Generated at Thu Feb 08 03:43:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.