[SERVER-25922] Replication fails due to to many open files or out of memmory Created: 01/Sep/16 Updated: 04/Oct/16 Resolved: 04/Oct/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, WiredTiger |
| Affects Version/s: | 3.2.4, 3.2.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kral Markus [X] | Assignee: | Kelsey Schubert |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
see attached additional-info |
||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
We have 2 running Servers in replication. We are adding a 3rd server. The data seems to get synced without any problems, but the index fails to get build with either "to many open files" (although the limit here is set to 512000, recommended are 64000) or because of out of memmory. Attached you can find the log-outputs of both. I tested with version 3.2.4 and 3.2.9 on the affected instance. |
| Comments |
| Comment by Kelsey Schubert [ 04/Oct/16 ] | ||||||
|
Hi KralMar, Since we haven't heard back from you, I assume that the ulimit settings explained the issue. Regarding the OOM kills, you have been affected by Thank you, | ||||||
| Comment by Kelsey Schubert [ 12/Sep/16 ] | ||||||
|
Hi KralMar, The index build is failing to open /srv/mongodb/data/_tmp/extsort.976 and I see WiredTiger has 45 currently open files, which brings the total number suspiciously close to the default ulimit setting of 1024. Would you please double check that the user running mongod has the correct ulimit settings?
Thank you, | ||||||
| Comment by Kral Markus [X] [ 12/Sep/16 ] | ||||||
|
Hi, what is the current status here? Thanks | ||||||
| Comment by Kral Markus [X] [ 02/Sep/16 ] | ||||||
|
Hi Ramón, thanks for your quick response. As the environment is not that critical and we do have a running replica-set with 2 members and a backup, | ||||||
| Comment by Ramon Fernandez Marina [ 01/Sep/16 ] | ||||||
|
I understand your concern KralMar. You can see the data that it collects here, which is essentially the data produced by the following mongo shell commands:
This data is gathered at periodic intervals by the server, compressed, and stored inside the diagnostic.data directory. It contains no collection data. | ||||||
| Comment by Kral Markus [X] [ 01/Sep/16 ] | ||||||
|
Hi Ramón, we are more than happy to provide you with everything you need, Kind Regards | ||||||
| Comment by Ramon Fernandez Marina [ 01/Sep/16 ] | ||||||
|
Sorry to hear you're having trouble adding a third node KralMar. I don't see a "smoking gun" in the information you already sent, so I'd like to ask you for the following:
I've created a secure upload portal so you can share this information privately with us. If this issue is critical for you, I would recommend you consider one or both of these workarounds:
I can't guarantee that these workarounds will address the issue since the data may indicate there's a bug somewhere, but trying won't hurt either. If you decide to try, please make sure you keep the existing contents of the diagnostic.data directory somewhere (uploading them to us is sufficient) – if the initial sync succeeds with the workaround we'll need to compare this data before and after to understand what the issue is. Thanks, |