[SERVER-3194] Too many open files using ulimit 10000 Created: 05/Jun/11  Updated: 29/Aug/11  Resolved: 06/Jun/11

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 1.8.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: ofer samocha Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: Linux
Participants:

 Description   

during the process migrating servers to raid5, I've used rs.stepDown() on amdbm023, and entered amdbm024 to full recovery mode.
after few seconds amdbm023 closed itself on "Too many open files error". I have ulimit 10000, 30 mongos and 100 connections on each mongos.

Sun Jun 5 02:41:38 [initandlisten] connection accepted from 10.102.33.76:54499 #32445
Sun Jun 5 02:41:38 [conn32443] SyncClusterConnection connecting to [amdbm001:10001]
Sun Jun 5 02:41:38 [conn30639] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn32443] SyncClusterConnection connecting to [amdbm003:10001]
Sun Jun 5 02:41:38 [conn32443] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn30713] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn31312] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn30717] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn31582] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn31568] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn32445] SyncClusterConnection connecting to [amdbm001:10001]
Sun Jun 5 02:41:38 [conn32445] SyncClusterConnection connecting to [amdbm003:10001]
Sun Jun 5 02:41:38 [conn32445] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn30927] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn31282] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [initandlisten] connection accepted from 10.117.23.28:55828 #32446
Sun Jun 5 02:41:38 [conn30475] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [initandlisten] connection accepted from 10.117.14.181:45708 #32447
Sun Jun 5 02:41:38 [conn31377] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn32444] SyncClusterConnection connecting to [amdbm001:10001]
Sun Jun 5 02:41:38 [initandlisten] connection accepted from 10.218.23.219:46445 #32448
Sun Jun 5 02:41:38 [conn31583] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn31564] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn32444] SyncClusterConnection connecting to [amdbm003:10001]
Sun Jun 5 02:41:38 [conn31564] ERROR: connect invalid socket errno:24 Too many open files
Sun Jun 5 02:41:38 [conn31564] SyncClusterConnection connect fail to: amdbm005:10001 errmsg: couldn't connect to server amdbm005:10001
Sun Jun 5 02:41:38 [conn32444] SyncClusterConnection connecting to [amdbm005:10001]
Sun Jun 5 02:41:38 [conn32446] SyncClusterConnection connecting to [amdbm001:10001]
Sun Jun 5 02:41:38 [conn31564] setShardVersion - relocking slow: 3000
Sun Jun 5 02:41:38 [conn31564] query admin.$cmd ntoreturn:1 command:

{ setShardVersion: "viber.text", configdb: "amdbm001:10001,amdbm003:10001,amdbm005:10001", version: Timestamp 5000|0, serverID: ObjectId('4de7245e8fa7a7267 17fd7d0'), authoritative: true, shard: "set12", shardHost: "set12/amdbm024:10000,amdbm023:10000" }

reslen:73 3000ms
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Sun Jun 5 02:41:38 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files



 Comments   
Comment by ofer samocha [ 05/Jun/11 ]

after restart (and deleting the lock) server started ok

Generated at Thu Feb 08 03:02:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.