[SERVER-39712] Mongodb Too many open files, WT_PANIC: WiredTiger library panic and data corruprion***aborting after fassert() failure Created: 21/Feb/19  Updated: 06/May/19  Resolved: 06/May/19

Status: Closed
Project: Core Server
Component/s: Performance, Replication, Sharding, WiredTiger
Affects Version/s: 4.0.6
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Salamuddin Pranayan Assignee: Danny Hatcher (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File last-error-mongo     HTML File last-error-mongo #2     Text File limits.txt     File metrics.2019-03-08T00-33-53Z-00000     File metrics.2019-03-08T01-32-23Z-00000     File metrics.2019-03-08T02-04-55Z-00000     File metrics.2019-03-08T02-27-10Z-00000     File metrics.2019-03-08T02-35-23Z-00000     File metrics.interim     File mongod.log     File mongod.log.2019-03-08T18-23-02     Text File sysctl.txt    
Participants:

 Description   

Hello, our database already using sharding and replication, we often find these kind of errors, we attached the details below, some errors like this `Mongodb Too many open files, WT_PANIC: WiredTiger library panic and data corruprion***aborting after fassert() failure`

often showed up. what should we do to solve it.

Thanks in advance



 Comments   
Comment by Danny Hatcher (Inactive) [ 25/Apr/19 ]

salamflamo are you still experiencing this issue?

Comment by Danny Hatcher (Inactive) [ 13/Mar/19 ]

Hello,

I'm sorry; we're still trying to identify what could be causing your open file limit. Could you run the following against your mongod process?

ls -l /proc/pid/fd/ | wc -l

Thank you,

Danny

Comment by Salamuddin Pranayan [ 09/Mar/19 ]

Hello, I upload a new file with name "mongod.log.2019-03-08T18-23-02" above, please check and you'll find an error message like this text below. I did all recommendation settings as mongodb website said also repairing the database, but I still got this error :

2019-03-09T01:22:50.649+0700 E STORAGE [conn439] WiredTiger error (24) [1552069370:648313][10992:0x7fd71e260700], file:index-30471-1406128765714834102.wt, WT_SESSION.open_cursor: __posix_open_file, 715: /var/lib/mongo/index-30471-1406128765714834102.wt: handle-open: open: Too many open files Raw: [1552069370:648313][10992:0x7fd71e260700], file:index-30471-1406128765714834102.wt, WT_SESSION.open_cursor: __posix_open_file, 715: /var/lib/mongo/index-30471-1406128765714834102.wt: handle-open: open: Too many open files
2019-03-09T01:22:50.649+0700 E STORAGE [conn439] Failed to open a WiredTiger cursor: table:index-30471-1406128765714834102
2019-03-09T01:22:50.649+0700 E STORAGE [conn439] This may be due to data corruption. Please read the documentation for starting MongoDB with --repair here: http://dochub.mongodb.org/core/repair
2019-03-09T01:22:50.653+0700 F - [conn439] Fatal Assertion 50882 at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 143
2019-03-09T01:22:50.653+0700 F - [conn439]

***aborting after fassert() failure

I don't know what should I do for next way. Could you still help me to fixing this?.

Thank you for your help.

Thank you.

Comment by Danny Hatcher (Inactive) [ 08/Mar/19 ]

Hello,

Are you talking about the following lines?

2019-03-08T09:31:08.734+0700 E STORAGE  [initandlisten] WiredTiger error (-31803) [1552012268:734221][9051:0x7f577ecf0b80], file:WiredTiger.wt, WT_CURSOR.next: __schema_create_collapse, 111: metadata information for source configuration "colgroup:collection-198-7499816531420150789" not found: WT_NOTFOUND: item not found Raw: [1552012268:734221][9051:0x7f577ecf0b80], file:WiredTiger.wt, WT_CURSOR.next: __schema_create_collapse, 111: metadata information for source configuration "colgroup:collection-198-7499816531420150789" not found: WT_NOTFOUND: item not found

When WiredTiger finds a partial metadata set it prints that informational message, skips that table and keeps going. So we are letting the cursor continue its cursor walk and complete. Consequently, these log lines do not indicate an issue that would prevent startup as MongoDB. In fact, we see that that MongoDB successfully started and began accepting connections. Unfortunately, the node is unable to connect to shard0013b:27017 which causes the other errors in the logs.

From what I can see, from the most recent logs you uploaded your server should be up and functioning correctly. If you have other reasons for thinking there is corruption, please let me know.

Thank you,

Danny

Comment by Salamuddin Pranayan [ 08/Mar/19 ]

Hi Danny,

I already upload files that yor're requested.

Please be patient to help me, is there a way to fixing corrupt collection?, from my database some index-xxxx-xxxxxxx.wt was corrupted, I already repair but still corrupt, please help me.

Thank you very much.

Comment by Danny Hatcher (Inactive) [ 06/Mar/19 ]

Hello,

It looks like you are still experiencing "too many open files":

2019-02-28T17:09:37.097+0700 E STORAGE [conn364] WiredTiger error (24) [1551348577:96211][16102:0x7f79ed449700], file:index-16407--4057599699049023237.wt, WT_SESSION.open_cursor: __posix_open_file, 715: /var/lib/mongo/index-16407--4057599699049023237.wt: handle-open: open: Too many open files Raw: [1551348577:96211][16102:0x7f79ed449700], file:index-16407--4057599699049023237.wt, WT_SESSION.open_cursor: __posix_open_file, 715: /var/lib/mongo/index-16407--4057599699049023237.wt: handle-open: open: Too many open files

Please provide the following to our Secure Upload Portal:
1. Updated mongod logs
2. Updated "diagnostic.data"
3. Output of cat /proc/<mongod pid>/limits
4. Output of sysctl -a

Thank you,

Danny

Comment by Salamuddin Pranayan [ 28/Feb/19 ]

Hello,

Thank your for helping me, but I still have the same problem, I think this is because our database has corrupt , sometimes I got an error like this below

2019-02-28T17:09:37.007+0700 E - [ftdc] Assertion: Location13538: couldn't open [/proc/16102/stat] Unknown error src/mongo/util/processinfo_linux.cpp 81
2019-02-28T17:09:37.097+0700 E STORAGE [conn364] WiredTiger error (24) [1551348577:96211][16102:0x7f79ed449700], file:index-16407--4057599699049023237.wt, WT_SESSION.open_cursor: __posix_open_file, 715: /var/lib/mongo/index-16407-4057599699049023237.wt: handle-open: open: Too many open files Raw: [1551348577:96211][16102:0x7f79ed449700], file:index-16407--4057599699049023237.wt, WT_SESSION.open_cursor: __posix_open_file, 715: /var/lib/mongo/index-16407-4057599699049023237.wt: handle-open: open: Too many open files
2019-02-28T17:09:37.097+0700 E STORAGE [conn364] Failed to open a WiredTiger cursor: table:index-16407--4057599699049023237
2019-02-28T17:09:37.097+0700 E STORAGE [conn364] This may be due to data corruption. Please read the documentation for starting MongoDB with --repair here: http://dochub.mongodb.org/core/repair
2019-02-28T17:09:37.097+0700 F - [conn364] Fatal Assertion 50882 at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 143
2019-02-28T17:09:37.098+0700 F - [conn364]

***aborting after fassert() failure

 

and sometimes I got an error because of WIREDTIGER PANIC .

Then I continue trying to fixing this problem with "mongod --dbpath /var/lib/mongo --repair", but I still got the problem with ended like this below

2019-02-28T17:09:37.097+0700 E STORAGE [conn364] This may be due to data corruption. Please read the documentation for starting MongoDB with --repair here: http://dochub.mongodb.org/core/repair
2019-02-28T17:09:37.097+0700 F - [conn364] Fatal Assertion 50882 at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 143
2019-02-28T17:09:37.098+0700 F - [conn364]

***aborting after fassert() failure

 

So for now I still has that problem. What should I have to do?

Please to be patient to help me.

Thank you.

Comment by Danny Hatcher (Inactive) [ 28/Feb/19 ]

Hello,

Yes, please use the recommendations in our docs. Per the man page for sysctl, you can use sysctl -w variable=value to change the settings. When you change them, do you still see your issue?

Thank you,

Danny

Comment by Salamuddin Pranayan [ 28/Feb/19 ]

Hello,

From default setting here in the below

  1. fs.file-max = 3245496
  2. kernel.pid_max = 32768
  3. kernel.threads-max = 255745
  4. vm.max_map_count = 65530

So I need  to setting from here https://docs.mongodb.com/manual/administration/production-checklist-operations/ ?
Thank you very much, I wanna try it.

 

Comment by Danny Hatcher (Inactive) [ 27/Feb/19 ]

Hello,

I apologize, I should have asked this when I initially asked about ulimits. What is your kernel.pid_max value? You can find it via sysctl -a. Please run that command and if the values do not match the recommendations below from our Operations Checklist, please change them.

  1. fs.file-max value of 98000
  2. kernel.pid_max value of 64000
  3. kernel.threads-max value of 64000
  4. vm.max_map_count value of 128000

Thank you,

Danny

Comment by Salamuddin Pranayan [ 27/Feb/19 ]

Hello,

Thank you, I just uploaded , there are 10 files , from diagnostic and log, I just uploaded for log file at 26th feb

Thank you

Comment by Danny Hatcher (Inactive) [ 26/Feb/19 ]

Hello,

You can try increasing that limit but it would be rare for that to be necessary. In order for us to investigate further, please upload your mongod logs and "diagnostic.data" folder (located under your $dbpath) to our Secure Upload Portal. Only MongoDB engineers will be able to see any files you upload there and the contents will be deleted after some time.

Thank you,

Danny

Comment by Salamuddin Pranayan [ 26/Feb/19 ]

Hello,

Here when I check.

Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 64000 64000 processes
Max open files 64000 64000 files
Max locked memory unlimited unlimited bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 192378 192378 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us

when I see open files it's just 64000, should I increasing the limit?

Thank you
 

Comment by Danny Hatcher (Inactive) [ 25/Feb/19 ]

Hello,

Can you please provide the output of the following command when substituting in your mongod pid? This will tell us the ulimits currently being used by the process itself.

cat /proc/<mongod pid>/limits

Thank you,

Danny

Comment by Salamuddin Pranayan [ 21/Feb/19 ]

Hi,
I already setting the ulimit like this

`ulimit -f unlimited -t unlimited -v unlimited -l unlimited -n 64000 -m unlimited -u 64000`

with guide from this page https://docs.mongodb.com/manual/reference/ulimit/index.html#recommended-ulimit-settings

thank you

Comment by Danny Hatcher (Inactive) [ 21/Feb/19 ]

Hello,

You are most likely seeing these messages because your server does not match our recommended ulimit settings. If you follow our recommendations on that page, most notably -n (open files): 64000, do you still see these errors?

Thank you,

Danny

Generated at Thu Feb 08 04:52:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.