[SERVER-8515] Unable to start DB with > 1024 files after upgrading for 2.2.x to 2.4.0-rc0 Created: 11/Feb/13 Updated: 11/Jul/16 Resolved: 12/Feb/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking |
| Affects Version/s: | 2.4.0-rc0 |
| Fix Version/s: | 2.4.0-rc1 |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | Alvin Richards (Inactive) | Assignee: | Eric Milkie |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
OS-X |
||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
Problem:
Reproduce:
Create enough DB's so that startup will be successful
Startup will be ok
Add another DB and shutdown
Startup will now fail
Note: |
| Comments |
| Comment by Alvin Richards (Inactive) [ 16/Feb/13 ] | |||||||||||||||||||||
|
Tested successfully on OS-X with git version: 1bd8b84c64214356f482fa3164d88e664f585243 | |||||||||||||||||||||
| Comment by auto [ 13/Feb/13 ] | |||||||||||||||||||||
|
Author: {u'date': u'2013-02-13T14:15:45Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: | |||||||||||||||||||||
| Comment by Eric Milkie [ 12/Feb/13 ] | |||||||||||||||||||||
|
See | |||||||||||||||||||||
| Comment by auto [ 12/Feb/13 ] | |||||||||||||||||||||
|
Author: {u'date': u'2013-02-12T19:02:38Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: | |||||||||||||||||||||
| Comment by Eric Milkie [ 12/Feb/13 ] | |||||||||||||||||||||
|
We will fix this by not opening all databases at startup; see We should also prevent calling select() with fd's higher than FD_SETSIZE; that work will be done in this ticket. | |||||||||||||||||||||
| Comment by Eric Milkie [ 12/Feb/13 ] | |||||||||||||||||||||
|
I ran this in the debugger. We are passing 1026 (maxfd+1) as the first parameter to select(). The Darwin man page says:
Since our listening socket gets assigned a number higher than 1024, I don't think we can use select() with a bit array of 1024 (32 int32's) to listen on it? | |||||||||||||||||||||
| Comment by Eric Milkie [ 12/Feb/13 ] | |||||||||||||||||||||
|
I tried this on OS X and I can reproduce the behavior same as Alvin. Possible OS X select() issue? | |||||||||||||||||||||
| Comment by Alvin Richards (Inactive) [ 12/Feb/13 ] | |||||||||||||||||||||
|
-vvvvv I get the following on 2.4.0-rc0
| |||||||||||||||||||||
| Comment by Alvin Richards (Inactive) [ 12/Feb/13 ] | |||||||||||||||||||||
|
No stack trace, just these errors in the log above. Need me to try with more verbose logging? | |||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 12/Feb/13 ] | |||||||||||||||||||||
|
Just tried this and didn't get a crash. | |||||||||||||||||||||
| Comment by Alvin Richards (Inactive) [ 11/Feb/13 ] | |||||||||||||||||||||
|
Forgot to post the ulimits
| |||||||||||||||||||||
| Comment by Alvin Richards (Inactive) [ 11/Feb/13 ] | |||||||||||||||||||||
|
Looks like this was introduced between 2.3.0 (OK) and 2.3.1 (fails). |