[SERVER-33291] Mongo crashing every 20 mins - WiredTiger Error Created: 13/Feb/18  Updated: 21/Mar/18  Resolved: 20/Feb/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.19
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Anthony Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Hello,

I figured I'd start by posting this as a question, perhaps later it can be upgraded to a bug.

First some background information. I am running the v3.2.19 version of Mongo via docker as a DB for rocketchat which is also containerized. I have the mongodb data directory bind mounted into the container via docker. I have a node running with 8 gbs of ram on ubuntu 16.04. The database is using WiredTiger as it's backend. I have the rocketchat instance up and running, the db is writeable and everything seems to function fine. However roughly every 20 mins or so the mongo container crashes with the following output.

2018-02-13T16:16:40.756+0000 E STORAGE  [thread1] WiredTiger (13) [1518538600:756646][1:0x7f4e7f26d700], file:WiredTiger.wt, WT_SESSION.checkpoint: /data/db/WiredTiger.turtle.set: handle-open: open: Permission denied
2018-02-13T16:16:40.757+0000 E STORAGE  [thread1] WiredTiger (13) [1518538600:757215][1:0x7f4e7f26d700], checkpoint-server: checkpoint server error: Permission denied
2018-02-13T16:16:40.757+0000 E STORAGE  [thread1] WiredTiger (-31804) [1518538600:757846][1:0x7f4e7f26d700], checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
2018-02-13T16:16:40.758+0000 I -        [thread1] Fatal Assertion 28558
2018-02-13T16:16:40.758+0000 I -        [thread1] 
 
***aborting after fassert() failure
 
 
2018-02-13T16:16:40.795+0000 F -        [thread1] Got signal: 6 (Aborted).
 
 0x1357642 0x1356779 0x1356f82 0x7f4e83e31890 0x7f4e83aac067 0x7f4e83aad448 0x12d74b2 0x10ca4e3 0x9789eb 0x978be8 0x978dac 0x1a3fae3 0x7f4e83e2a064 0x7f4e83b5f62d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F57642","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F56779"},{"b":"400000","o":"F56F82"},{"b":"7F4E83E22000","o":"F890"},{"b":"7F4E83A77000","o":"35067","s":"gsignal"},{"b":"7F4E83A77000","o":"36448","s":"abort"},{"b":"400000","o":"ED74B2","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"CCA4E3"},{"b":"400000","o":"5789EB","s":"__wt_eventv"},{"b":"400000","o":"578BE8","s":"__wt_err"},{"b":"400000","o":"578DAC","s":"__wt_panic"},{"b":"400000","o":"163FAE3"},{"b":"7F4E83E22000","o":"8064"},{"b":"7F4E83A77000","o":"E862D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.19", "gitVersion" : "a9f574de6a566a58b24d126b44a56718d181e989", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.0-112-generic", "version" : "#135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "EB76513EC468515CD544A7D421C9AB0C2C3EC848" }, { "b" : "7FFC8395C000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "FA97F4849697BBE252BA1F7FB2316979E93E61DE" }, { "b" : "7F4E84D5E000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "21115992A1F885E1ACE88AADA60F126AD9759D03" }, { "b" : "7F4E84962000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "FD6376149047833953B0269E84DE181CA45DBE90" }, { "b" : "7F4E8475A000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "A63C95FB33CCA970E141D2E13774B997C1CF0565" }, { "b" : "7F4E84556000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "D70B531D672A34D71DB42EB32B68E63F2DCC5B6A" }, { "b" : "7F4E84255000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "152C93BA3E8590F7ED0BCDDF868600D55EC4DD6F" }, { "b" : "7F4E8403F000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "D5FB04F64B3DAEA6D6B68B5E8B9D4D2BC1A6E1FC" }, { "b" : "7F4E83E22000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9DA9387A60FFC196AEDB9526275552AFEF499C44" }, { "b" : "7F4E83A77000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "48C48BC6ABB794461B8A558DD76B29876A0551F0" }, { "b" : "7F4E84FBF000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "1D98D41FBB1EABA7EC05D0FD7624B85D6F51C03C" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1357642]
 mongod(+0xF56779) [0x1356779]
 mongod(+0xF56F82) [0x1356f82]
 libpthread.so.0(+0xF890) [0x7f4e83e31890]
 libc.so.6(gsignal+0x37) [0x7f4e83aac067]
 libc.so.6(abort+0x148) [0x7f4e83aad448]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x12d74b2]
 mongod(+0xCCA4E3) [0x10ca4e3]
 mongod(__wt_eventv+0x440) [0x9789eb]
 mongod(__wt_err+0x8D) [0x978be8]
 mongod(__wt_panic+0x24) [0x978dac]
 mongod(+0x163FAE3) [0x1a3fae3]
 libpthread.so.0(+0x8064) [0x7f4e83e2a064]
 libc.so.6(clone+0x6D) [0x7f4e83b5f62d]
-----  END BACKTRACE  -----

It seems obvious that this may be a permission issue, however I have verified that within the container the ownership both group and user of the data dir are mongodb and I can easily touch and delete files within as that user. I have noticed that the file WiredTiger.turtle.set does not exist in the directory. Not sure what else I should be poking at or trying to resolve this. Any pointers would be greatly appreciated.



 Comments   
Comment by Kelsey Schubert [ 20/Feb/18 ]

Thanks for confirming that the issue is outside of mongodb, I hope you're able to track down the permissions issue.

Kind regards,
Kelsey

Comment by Anthony [ 14/Feb/18 ]

Okay, so I guess the permissions were just messed up. Outside of the container I have chowned all the files to docker:docker with a mode of 755. So far the database has been up for 17 hours with no crashes. I believe the issue was the ownership but I have made so many changes at this point, I cannot pin point exactly if it was the mode or ownership on the db files which fixed this. Thanks for your quick response Mark. Much appreciated. I'm off to test a bit more.

Cheers

Comment by Anthony [ 13/Feb/18 ]

Hello Mark,

So the machine is a fresh box, we have no crons present on it at the moment. As for the 20 mins I should have mentioned that is an approximation, sometimes it stays up for 18 other times almost 30. The "Bind" mount in question is actually just a docker volume mount, we are mounting /opt/rocketchat/db to /data/db in the mongo container. Permission both inside and out are the same, I have tested writing files to /data/db within the container as the mongodb user with no issues. The underlying file system on the machine is ext4, I know it's not the recommended but I believe it is supported. I am continuing to test for permission issues. I fear it may be something obvious I am missing. I am currently testing 777 on all files under the db dir outside of the container, these permission will translate within as well. Waiting to see if it crashes again.

Comment by Mark Agarunov [ 13/Feb/18 ]

Hello aeb123,

Thank you for the report. To get a better idea of what may be causing this error, I'd like to request some additional information.

  • What is the underlying filesystem being used?
  • Which options is the bind mount mounted with?
  • As you mention this happens every 20 minutes, is it possible there's a cron job or other external process that (temporarily) changes the file or folder permissions/ownership?

This should give some insight into why you're seeing the permission denied error.

Thanks,
Mark

Generated at Thu Feb 08 04:32:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.