[SERVER-22450] Mongodb sub-folders sometimes don't get u+x permission Created: 03/Feb/16  Updated: 18/Feb/16  Resolved: 18/Feb/16

Status: Closed
Project: Core Server
Component/s: Admin, Storage, WiredTiger
Affects Version/s: 3.2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Radek Assignee: Kelsey Schubert
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongo-1.log    
Issue Links:
Related
related to SERVER-20235 Upsert (via PHP) on non-existsing dat... Closed
Operating System: ALL
Steps To Reproduce:

It's hard to reproduce... usually it happens only on slaves... We create a new database every day... and there were no problems since december... and then it could not create/replicate files from in db: ZZZZZZ-160130 (30.I.2016) at midnight.

That means, on previous days, there were no problems with permissions.

And now the same problem with ./diagnostic.data/ means – the issue still presists in 3.2.1...

Participants:

 Description   

Same problem as described here:
https://jira.mongodb.org/browse/SERVER-20235

This time:
upgrade from MongoDB 3.0.3 to 3.2.1
after upgrade & restart - mongo can't write to ./diagnostic.data/ (no x attribute/permission):

$ ls -la /ssd/mongodb-wt/diagnostic.data/
razem 8
drw-r-xr-x  2 mongodb nogroup 4096 lut  3 14:13 .
drwxr-xr-x 18 mongodb mongodb 4096 lut  3 14:30 ..

Context mongodb.log in attachement.

Most important lines:

  2016-02-03T14:13:50.516+0100 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/ssd/mongodb-wt/diagnostic.data'

and then:

2016-02-03T14:13:51.000+0100 W FTDC     [ftdc] Uncaught exception in 'UnknownError Caught std::exception of type boost::filesystem::filesystem_error: boost::filesystem::status: Permission denied: "/ssd/mongodb-wt/diagnostic.data/metrics.interim"' in full-time diagnostic data capture subsystem. Shutting down the full-time diagnostic data capture subsystem.



 Comments   
Comment by Kelsey Schubert [ 18/Feb/16 ]

Hi radek,

I'm glad that you have an effective workaround. We are still unable to reproduce this behavior: I followed the same upgrade path from MongoDB 3.0.3 to 3.2.1 and was able to write successfully to the diagnostics.data directory.

The fact that this issue emerges only following system restarts could indicate that the mask of the mongod process is being incorrectly set. If you are able, you may want to examine the mask of a mongod exhibiting this issue. Or you may want to examine if there are any automated processes being executed on these nodes which could explain the change in permission rights between server restarts.

The relevant code to create the diagnostic.data directory can be found here. We take the mask from the environment, so if there is an issue it may reside within the Boost.Filesystem library.

For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group.

Kind regards,
Thomas

Comment by Radek [ 09/Feb/16 ]

New/empty instance – whole log below (conf @ the end):

// couldn't create ..../local.0

2016-02-09T11:08:47.082+0100 I CONTROL  [initandlisten] MongoDB starting : pid=19456 port=27087 dbpath=/var/lib/mongodb/mongodb-abriter-27087 64-bit host=newapp1.indigo.pl
2016-02-09T11:08:47.082+0100 I CONTROL  [initandlisten] db version v3.0.7
2016-02-09T11:08:47.082+0100 I CONTROL  [initandlisten] git version: 6ce7cbe8c6b899552dadd907604559806aa2e9bd
2016-02-09T11:08:47.083+0100 I CONTROL  [initandlisten] build info: Linux ip-10-183-78-195 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 BOOST_LIB_VERSION=1_49
2016-02-09T11:08:47.083+0100 I CONTROL  [initandlisten] allocator: tcmalloc
2016-02-09T11:08:47.083+0100 I CONTROL  [initandlisten] options: { config: "/etc/mongod-27087.conf", net: { bindIp: "0.0.0.0", http: { RESTInterfaceEnabled: true, enabled: true }, port: 27087, unixDomainSocket: { enabled: true, filePermissions: 511 }, wireObjectCheck: false }, replication: { oplogSizeMB: 8, replSetName: "app" }, storage: { dbPath: "/var/lib/mongodb/mongodb-abriter-27087", directoryPerDB: false, engine: "mmapv1", journal: { enabled: false }, mmapv1: { smallFiles: true } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongodb-arbiter-27087.log", quiet: true, timeStampFormat: "iso8601-local" } }
2016-02-09T11:08:47.083+0100 I NETWORK  [websvr] admin web console waiting for connections on port 28087
2016-02-09T11:08:47.083+0100 I INDEX    [initandlisten] allocating new ns file /var/lib/mongodb/mongodb-abriter-27087/local.ns, filling with zeroes...
2016-02-09T11:08:47.788+0100 I STORAGE  [FileAllocator] allocating new datafile /var/lib/mongodb/mongodb-abriter-27087/local.0, filling with zeroes...
2016-02-09T11:08:47.788+0100 I STORAGE  [FileAllocator] creating directory /var/lib/mongodb/mongodb-abriter-27087/_tmp
2016-02-09T11:08:47.808+0100 I STORAGE  [FileAllocator] FileAllocator: couldn't create /var/lib/mongodb/mongodb-abriter-27087/local.0 (/var/lib/mongodb/mongodb-abriter-27087/_tmp/1455012527054147) errno:13 Permission denied
2016-02-09T11:08:47.809+0100 I STORAGE  [FileAllocator] error: failed to allocate new file: /var/lib/mongodb/mongodb-abriter-27087/local.0 size: 16777216 .  will try again in 10 seconds
2016-02-09T11:08:47.811+0100 I STORAGE  [FileAllocator] error removing files: boost::filesystem::remove: Permission denied: "/var/lib/mongodb/mongodb-abriter-27087/_tmp/1455012527054147"
2016-02-09T11:08:47.811+0100 I -        [initandlisten] Assertion: 12520:new file allocation failure
2016-02-09T11:08:47.814+0100 W STORAGE  [initandlisten] database /var/lib/mongodb/mongodb-abriter-27087 local could not be opened due to DBException 12520: new file allocation failure
2016-02-09T11:08:47.818+0100 I STORAGE  [initandlisten] exception in initAndListen: 12520 new file allocation failure, terminating
2016-02-09T11:08:47.818+0100 W REPL     [initandlisten] ReplicationCoordinatorImpl::shutdown() called before startReplication() finished.  Shutting down without cleaning up the replication system
2016-02-09T11:08:47.818+0100 I CONTROL  [initandlisten] now exiting
2016-02-09T11:08:47.818+0100 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2016-02-09T11:08:47.818+0100 I NETWORK  [initandlisten] closing listening socket: 8
2016-02-09T11:08:47.818+0100 I NETWORK  [initandlisten] removing socket file: /tmp/mongodb-27087.sock
2016-02-09T11:08:47.818+0100 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
2016-02-09T11:08:47.818+0100 I NETWORK  [initandlisten] shutdown: going to close sockets...
2016-02-09T11:08:47.819+0100 I STORAGE  [initandlisten] shutdown: waiting for fs preallocator...
2016-02-09T11:08:47.819+0100 I STORAGE  [initandlisten] shutdown: closing all files...
2016-02-09T11:08:47.819+0100 I STORAGE  [initandlisten] closeAllFiles() finished
2016-02-09T11:08:47.819+0100 I STORAGE  [initandlisten] shutdown: removing fs lock...
2016-02-09T11:08:47.819+0100 I CONTROL  [initandlisten] dbexit:  rc: 100

conf:

storage:
  dbPath: /var/lib/mongodb/mongodb-abriter-27087
  directoryPerDB: false
  engine: mmapv1
  mmapv1:
    smallFiles: true
  journal:
    enabled: false
replication:
  replSetName: app
  oplogSizeMB: 8
systemLog:
  destination: file
  path: /var/log/mongodb/mongodb-arbiter-27087.log
  logAppend: true
  timeStampFormat: iso8601-local
  quiet: true
net:
  port: 27087
  bindIp: 0.0.0.0
  wireObjectCheck : false
  unixDomainSocket:
    enabled : true
    filePermissions: 0777
  http:
    RESTInterfaceEnabled: true
    enabled: true

Comment by Radek [ 09/Feb/16 ]

And now _tmp: ... It's crazy... (error opening file "/local_storage/mongodb/mongodb-wt-27017/_tmp/extsort.0": errno:13 Permission denied)

2016-02-09T10:40:01.117+0100 I CONTROL  [initandlisten] db version v3.0.7
2016-02-09T10:40:01.117+0100 I CONTROL  [initandlisten] git version: 6ce7cbe8c6b899552dadd907604559806aa2e9bd
2016-02-09T10:40:01.117+0100 I CONTROL  [initandlisten] build info: Linux ip-10-183-78-195 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 BOOST_LIB_VERSION=1_49
2016-02-09T10:40:01.117+0100 I CONTROL  [initandlisten] allocator: tcmalloc
2016-02-09T10:40:01.117+0100 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: "0.0.0.0", http: { RESTInterfaceEnabled: true, enabled: true }, port: 27017, unixDomainSocket: { enabled: true, filePermissions: 511 }, wireObjectCheck: false }, replication: { oplogSizeMB: 500, replSetName: "app" }, storage: { dbPath: "/local_storage/mongodb/mongodb-wt-27017", directoryPerDB: true, engine: "wiredTiger", journal: { enabled: false }, wiredTiger: { engineConfig: { cacheSizeGB: 3 } } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongodb.log", quiet: true, timeStampFormat: "iso8601-local" } }
2016-02-09T10:40:01.118+0100 I INDEX    [initandlisten] found 1 index(es) that wasn't finished before shutdown
2016-02-09T10:40:01.118+0100 I STORAGE  [initandlisten] WARNING: the collection 'stats_live.daily' lacks a unique index on _id. This index is needed for replication to function properly
2016-02-09T10:40:01.118+0100 I STORAGE  [initandlisten]          To fix this, you need to create a unique index on _id. See http://dochub.mongodb.org/core/build-replica-set-indexes
2016-02-09T10:40:01.118+0100 I NETWORK  [websvr] admin web console waiting for connections on port 28017
2016-02-09T10:40:01.118+0100 I INDEX    [initandlisten] found 1 interrupted index build(s) on stats_live.daily
2016-02-09T10:40:01.118+0100 I INDEX    [initandlisten] note: restart the server with --noIndexBuildRetry to skip index rebuilds
2016-02-09T10:40:01.122+0100 I INDEX    [initandlisten] build index on: stats_live.daily properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "stats_live.daily" }
2016-02-09T10:40:01.122+0100 I INDEX    [initandlisten]          building index using bulk method
2016-02-09T10:40:04.909+0100 I -        [initandlisten] Assertion: 16818:error opening file "/local_storage/mongodb/mongodb-wt-27017/_tmp/extsort.0": errno:13 Permission denied
2016-02-09T10:40:04.917+0100 I CONTROL  [initandlisten] 
 0xf785c9 0xf17401 0xefc65f 0xefc70c 0xaa3de3 0xaab0e0 0xaab377 0xaa07d5 0x9469a5 0x947b1a 0xac474d 0x8256db 0x7f2ac9 0x7f7b16022b45 0x822389
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B785C9"},{"b":"400000","o":"B17401"},{"b":"400000","o":"AFC65F"},{"b":"400000","o":"AFC70C"},{"b":"400000","o":"6A3DE3"},{"b":"400000","o":"6AB0E0"},{"b":"400000","o":"6AB377"},{"b":"400000","o":"6A07D5"},{"b":"400000","o":"5469A5"},{"b":"400000","o":"547B1A"},{"b":"400000","o":"6C474D"},{"b":"400000","o":"4256DB"},{"b":"400000","o":"3F2AC9"},{"b":"7F7B16001000","o":"21B45"},{"b":"400000","o":"422389"}],"processInfo":{ "mongodbVersion" : "3.0.7", "gitVersion" : "6ce7cbe8c6b899552dadd907604559806aa2e9bd", "uname" : { "sysname" : "Linux", "release" : "3.16.0-4-amd64", "version" : "#1 SMP Debian 3.16.7-ckt9-2 (2015-04-13)", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "0520DC2A91D540A37F4E7859EB3FFED4590CBFCD" }, { "b" : "7FFDDCF8C000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "CC7EB75B72552EBBE1F1378D170315C9C89AE2C2" }, { "b" : "7F7B17633000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "79D807D4FC33BA35F385CCE1A8EFEBFA4AB0FECE" }, { "b" : "7F7B173D3000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "9A1CB0B2DC4C37E0DF08F44359344C0E1DF36736" }, { "b" : "7F7B16FD8000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "3F4ED1C52CB6D4A21726FE34B325334E3F072602" }, { "b" : "7F7B16DD0000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "2A229233022D1462F5144690C12B0FD92CBD5E9C" }, { "b" : "7F7B16BCC000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "2391B5C1F072DFEB7D8C0BB62B419D4D0BEADC9D" }, { "b" : "7F7B168C1000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "D36185376529DE139FA4955E47DFF9D58100B1CA" }, { "b" : "7F7B165C0000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "B337AE10C269FFBB088B791A9035B0BF021D3B93" }, { "b" : "7F7B163AA000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "D5FB04F64B3DAEA6D6B68B5E8B9D4D2BC1A6E1FC" }, { "b" : "7F7B16001000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "F93F79E3E47CDA4B7C433D02D321CE401C4A8501" }, { "b" : "7F7B17850000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "FAA8E708C782B16D41A876382A9547138790A275" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf785c9]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf17401]
 mongod(_ZN5mongo11msgassertedEiPKc+0xAF) [0xefc65f]
 mongod(+0xAFC70C) [0xefc70c]
 mongod(_ZN5mongo16SortedFileWriterINS_7BSONObjENS_8RecordIdEEC2ERKNS_11SortOptionsERKSt4pairINS1_25SorterDeserializeSettingsENS2_25SorterDeserializeSettingsEE+0x5D3) [0xaa3de3]
 mongod(_ZN5mongo6sorter13NoLimitSorterINS_7BSONObjENS_8RecordIdENS_27BtreeExternalSortComparisonEE5spillEv+0x40) [0xaab0e0]
 mongod(_ZN5mongo6sorter13NoLimitSorterINS_7BSONObjENS_8RecordIdENS_27BtreeExternalSortComparisonEE3addERKS2_RKS3_+0xB7) [0xaab377]
 mongod(_ZN5mongo26BtreeBasedBulkAccessMethod6insertEPNS_16OperationContextERKNS_7BSONObjERKNS_8RecordIdERKNS_19InsertDeleteOptionsEPl+0xA5) [0xaa07d5]
 mongod(_ZN5mongo15MultiIndexBlock6insertERKNS_7BSONObjERKNS_8RecordIdE+0x75) [0x9469a5]
 mongod(_ZN5mongo15MultiIndexBlock30insertAllDocumentsInCollectionEPSt3setINS_8RecordIdESt4lessIS2_ESaIS2_EE+0x20A) [0x947b1a]
 mongod(_ZN5mongo40restartInProgressIndexesFromLastShutdownEPNS_16OperationContextE+0x66D) [0xac474d]
 mongod(_ZN5mongo13initAndListenEi+0x156B) [0x8256db]
 mongod(main+0x139) [0x7f2ac9]
 libc.so.6(__libc_start_main+0xF5) [0x7f7b16022b45]
 mongod(+0x422389) [0x822389]
-----  END BACKTRACE  -----
2016-02-09T10:40:04.917+0100 E INDEX    [initandlisten] Index rebuilding did not complete: 16818 error opening file "/local_storage/mongodb/mongodb-wt-27017/_tmp/extsort.0": errno:13 Permission denied
2016-02-09T10:40:04.917+0100 I INDEX    [initandlisten] note: restart the server with --noIndexBuildRetry to skip index rebuilds
2016-02-09T10:40:05.124+0100 I -        [initandlisten] Fatal Assertion 26100
2016-02-09T10:40:05.124+0100 I -        [initandlisten] 
 
***aborting after fassert() failure
 
 
@newapp2:/local_storage/mongodb/mongodb-wt-27017$ ls -la
razem 196
drwxr-xr-x 5 mongodb mongodb  4096 lut  9 10:40 .
drwxr-xr-x 3 mongodb mongodb  4096 sty 21 12:59 ..
drwxr-xr-x 2 mongodb nogroup  4096 lut  9 10:17 local
-rw-r--r-- 1 mongodb nogroup 36864 lut  9 10:40 _mdb_catalog.wt
-rwxr-xr-x 1 mongodb nogroup     6 lut  9 10:40 mongod.lock
-rw-r--r-- 1 mongodb nogroup 36864 lut  9 10:40 sizeStorer.wt
drwxr-xr-x 2 mongodb nogroup  4096 lut  9 10:40 stats_live
-rw-r--r-- 1 mongodb nogroup    95 sty 21 13:04 storage.bson
drw-r-xr-x 2 mongodb nogroup  4096 lut  9 10:40 _tmp
-rw-r--r-- 1 mongodb nogroup    46 sty 21 13:04 WiredTiger
-rw-r--r-- 1 mongodb nogroup   490 sty 21 13:04 WiredTiger.basecfg
-rw-r--r-- 1 mongodb nogroup    21 sty 21 13:04 WiredTiger.lock
-rw-r--r-- 1 mongodb nogroup   905 lut  9 10:40 WiredTiger.turtle
-rw-r--r-- 1 mongodb nogroup 69632 lut  9 10:40 WiredTiger.wt

Comment by Radek [ 09/Feb/16 ]

Hello Thomas,
Here are answers:

1. New databases are created (more-or-less) just after midnight (via insert command)

$mdb = new MongoClient("mongodb://mdb4.XXXX.YY,mdb7.XXXX.YY,mdb9.XXXX.YY/?replicaSet=mainSet&readPreference=primary");
$db = $mdb->selectDB( "signals-".date("ymd") );
...
$db->selectCollection("records")->insert(
array('added'=>date('Y-m-d H:i:s'), 'provider'=>$provider['name'], 'records'=>$mongoPool),
array('w'=>0, 'continueOnError'=>true)
);

1a. Problem(s) with permissions happens mostly (only? not sure) on slave nodes.

2. No, the problem is not limited to "diagnostic.data" folder ("diagnostic.data" folder is just a prove, that I do not controll the process of managing priorites of MongoDB folders. Since I create a new database every day – the problem with permissions to "signals-YYMMDD" databses' folders (on slaves) happens not deterministicaly, every (let's say) every 20-40+ days ? But if it happens – it stalls replications on given slave.

3. This issue occurs sporadicly on different affected nodes, on different clusters (and databses created also monthly, not daily). IIRC, it only happens on slaves, but I'm not sure on that.

4. Mongod(s) are restarted quite rarely, mostly on upgrades (once a month, once every two months). But as you can see from "diagnostic.data" directory – this issue happened just after a restart (after an version upgrade).

A few words more:

  • I haven't seen the issue ever on MMAPv1 (non-WiredTiger) databases/clusters
  • Currently (a few days ago, after reporting that issue) I've "patched" my system on every node with 'pre-creating' given folders with the right permisions every day, so I will probably stay immune to the issue.

But I'm happy to help in future.

Comment by Kelsey Schubert [ 09/Feb/16 ]

Hi radek,

I have a few questions to get a better idea of what is going on here.

  1. How are you creating a new database each day?
  2. Is the permission issue currently limited to the diagnostic.data directory?
  3. Does this issue occur consistently or sporadicly occur on the affected node?
  4. Does this issue appear without restarting mongod? How frequently are your mongods restarted?

Thank you,
Thomas

Generated at Thu Feb 08 04:00:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.