[SERVER-32216] With Mongo 3.6(in docker) we hit this error "Failed to unlink socket file /tmp/mongodb-27017.sock Operation not permitted" Created: 08/Dec/17 Updated: 30/Oct/23 Resolved: 19/Dec/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.1, 3.7.1 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | lang qiu | Assignee: | Jonathan Reams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | SWNA | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||
| Steps To Reproduce: | 1. Using the latest 3.6 |
||||||||||||
| Sprint: | Platforms 2017-12-18, Platforms 2018-01-01 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
We run mongo in docker. We were using 3.4 before and everything was good. Yesterday we upgraded to 3.6 and redeploy our docker services, then mongo reported the error "Failed to unlink socket file /tmp/mongodb-27017.sock Operation not permitted" The detailed log,
|
| Comments |
| Comment by Ramon Fernandez Marina [ 19/Dec/17 ] |
|
qiulang, I'm going to resolve this ticket since the fix to remove the unix domain socket on clean shutdown has made it into the codebase – thanks for reporting this. We're planning on publishing a release with this fix soon. Until then you can use one of the workarounds listed above. For further support discussions please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. Thanks, |
| Comment by lang qiu [ 16/Dec/17 ] |
|
Hi as I commented on Dec 11, using --nounixsocket has the same effect as adding a volume for /tmp, no linking error, only "this server is bound to localhost" so my other docker container can't connect to it. I will try --bind_ip=0.0.0.0 as you suggested. So the only problem we have now is why do I see /tmp/mongodb-27017.sock error if I start mongo from a fresh docker container? Is it possible that my dockerfile caused the problem? I need my mongo image has the default data so I used a dockerfile at here suggested. |
| Comment by Jonathan Reams [ 15/Dec/17 ] |
|
The warning about the server being bound to localhost was introduced in 3.6 as a security improvement. You can restore the 3.4 behavior by specifying --bind_ip=0.0.0.0. If you're starting from a fresh docker container each time, then I don't understand why you're seeing the error about /tmp/mongodb-27017.sock error, unless there's a permissions or some other problem with /tmp. I also don't know why --nounixsocket didn't resolve your issue since that should prevent unlinking existing sockets in the first place. This is sounding more like a problem with the docker environment. |
| Comment by lang qiu [ 15/Dec/17 ] |
|
OK, I find a way to check mongodb-27017.sock, so I added a docker volume for /tmp, i.e. map /tmp to host local folder. But unfortunately, the same error showed again "this server is bound to localhost". So my other container failed to connect to it. yunwei-product-dbdebug_mongo_1 | 2017-12-15T02:59:10.946715113Z 2017-12-15T02:59:10.941+0000 I CONTROL [initandlisten] ** WARNING: This server is bound to localhost. |
| Comment by lang qiu [ 15/Dec/17 ] |
|
Hi, when I used mongo 3.4 it was indeed a /tmp/mongodb-27017.sock, mongodb is the owner (check the attached). But when I used 3.6, the docker container kept restarting (with the error I reported) so I wasn't able to check its folder. What else can I do to further debug ? |
| Comment by lang qiu [ 13/Dec/17 ] |
|
Hi, I run mongo in docker and I did a clean docker-compose build/up, which means there was nothing in /tmp. Also, the docker runs mongod with root, what other users can it be (I have limited knowledge about running mongo in docker though)? I will double check it tomorrow. |
| Comment by Jonathan Reams [ 12/Dec/17 ] |
|
qiulang, we believe this problem is being caused by an old mongod's UNIX domain socket being left in /tmp on shutdown, and the socket being owned by a different user than the new mongod. I've just pushed a change that will be in 3.6.1 that ensures these socket files get cleaned up during normal shutdown of the server. I don't understand why --nounixsocket doesn't work around your issue, however, or why downgrading to 3.4 fixes your issue. Could you check whether there is a /tmp/mongodb-27017.sock file on your system and what its permissions are? |
| Comment by Githook User [ 12/Dec/17 ] |
|
Author: {'name': 'Jonathan Reams', 'email': 'jbreams@mongodb.com', 'username': 'jbreams'}Message: (cherry picked from commit 9dc34426570cc57cfdb4b6f6ea4f31018662082f) |
| Comment by Githook User [ 12/Dec/17 ] |
|
Author: {'name': 'Jonathan Reams', 'email': 'jbreams@mongodb.com', 'username': 'jbreams'}Message: |
| Comment by lang qiu [ 11/Dec/17 ] |
|
I tried both --transportLayer=legacy and --nounixsocket, unfortunately, them didn't work for me. I also tried RUN rm /tmp/mongodb-27017.sock as another jira suggested, it did not work either. After I switched back to mongo 3.4, the problem went away! I have tried several times to confirm it. BTW the command I use to start mongo is CMD ["mongod", "--config", "/etc/mongodb.conf", "--smallfiles","--auth","--transportLayer=legacy"] mongodb.conf only has 1 line dbpath = /data/db2 |
| Comment by Jonathan Reams [ 08/Dec/17 ] |
|
3.6 has a new networking implementation, but they should both have the same behaviour here. Could you try running the mongod with --transportLayer=legacy to see if the problem reproduces? That should switch mongodb back to using the 3.4 networking code. You can also work around this problem by disabling UNIX sockets if you aren't using them (see https://docs.mongodb.com/manual/reference/configuration-options/#net-unixdomainsocket-options) |
| Comment by Kaloian Manassiev [ 08/Dec/17 ] |
|
This is also making it problematic to use MongoDB 3.6 on a shared Ubuntu system, because different users create these devices with different ownership and it leads to the same error for whoever comes second. |