[SERVER-4739] Race condition in log rotation (was: SIGUSR1 should set a flag rather than doing rotation) Created: 21/Jan/12 Updated: 11/Jul/16 Resolved: 17/Apr/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Logging, Stability |
| Affects Version/s: | 2.2.0, 2.2.2 |
| Fix Version/s: | 2.2.5, 2.4.4, 2.5.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Eric Milkie |
| Resolution: | Done | Votes: | 24 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
The locking within logRotate only covers the swap of file handles, but renaming an open log file while it is being written to by another thread is not safe. The locking needs to be modified so that no race conditions exist. In addition, doing log rotation from a signal handler may be unsafe. It would be better to set a flag in the signal handler, and respond to it from a normal (non-signal) code path. For example, we could rotate the log next time something is logged. Currently there is a deadlock if the current thread is in the middle of logging. |
| Comments |
| Comment by Stennie Steneker (Inactive) [ 02/Sep/13 ] | |
|
Shay: the fixVersions are noted in the issue Details section at the top of the page:
So for the 2.4 release series, 2.4.4 or newer will have the fix. Regards, | |
| Comment by Shay Asher [ 02/Sep/13 ] | |
|
is the issue fixed on 2.4.5? | |
| Comment by Tad Marshall [ 12/May/13 ] | |
|
Hi John, I think that you have hit This is fixed in the nightly 2.4 build; the fix will be in version 2.4.4. If you want to verify that this is fixed in the latest 2.4 code, you could download http://downloads.mongodb.org/linux/mongodb-linux-x86_64-v2.4-latest.tgz, which is marked as version "2.4.4-pre-". Tad | |
| Comment by John Dever [ 12/May/13 ] | |
|
Let me know if this needs to be a new issue, but I believe this is still occurring on the latest 10gen mongo RPM. Doing a SIGUSR1 against mongos causes an immediate crash under absolutely no load. This has been happening against 8ish servers in my test env, all running identical configs. This does not occur against mongod. MongoS version 2.4.3 starting: pid=31081 port=27017 64-bit host=<redacted> (--help for usage) mongod --version MongoS Log: , } } | |
| Comment by auto [ 18/Apr/13 ] | |
|
Author: {u'date': u'2013-04-18T15:33:22Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: | |
| Comment by auto [ 18/Apr/13 ] | |
|
Author: {u'date': u'2013-04-17T15:02:23Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: | |
| Comment by auto [ 18/Apr/13 ] | |
|
Author: {u'date': u'2013-04-16T15:37:19Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: Conflicts: | |
| Comment by Johan Hedin [ 17/Apr/13 ] | |
|
Tad, Thanks for the clarification! So I will just wait then. Or try to add the master branch changes to 2.2 or 2.4. Anyway, at least I learned a bit of Jira usage | |
| Comment by Tad Marshall [ 17/Apr/13 ] | |
|
Hi Johan, Sorry for the confusion; the backports have not happened yet. This is a result of the way we use the "Backport" field in Jira. When Backport is set to Yes, the plan is to backport the change to one or more earlier versions, and when those versions are chosen they are added to the "Fix Version/s" list. But it isn't actually in those versions until Backport is changed to Done. Note also the specific versions: 2.2.5 and 2.4.3, neither of which has been created yet. Tad | |
| Comment by Johan Hedin [ 17/Apr/13 ] | |
|
I see this marked as fixed in both 2.2 and 2.4 but I only see a commit in the master branch?! I'm asking because I would like to bring this in as a patch when I compile mongod 2.2, and soon 2.4, myself. | |
| Comment by auto [ 17/Apr/13 ] | |
|
Author: {u'date': u'2013-04-17T15:02:23Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: | |
| Comment by auto [ 17/Apr/13 ] | |
|
Author: {u'date': u'2013-04-16T15:37:19Z', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: | |
| Comment by Johan Hedin [ 07/Apr/13 ] | |
|
Thanks again for the update. I'm just about to increase the load on my db quite drastically so this information was really in the last minute! | |
| Comment by Leonid Evdokimov [ 07/Apr/13 ] | |
|
db.runCommand does not trigger the bug to the best of my knowledge. The issue with SIGUSR1 is simple: some functions that are not async-signal-safe are called during log rotation, that crashed mongod process. | |
| Comment by Johan Hedin [ 07/Apr/13 ] | |
|
Thanks for the explanation Leonid! Changing from kill -USR1 to what you suggest was actually my thought but I got the impression from the description of this issue that even db.runCommand("logRotate") could crash/dead lock the server!? Anything you could comment on? | |
| Comment by Leonid Evdokimov [ 07/Apr/13 ] | |
|
It becomes extremely dangerous only under high load. | |
| Comment by Johan Hedin [ 07/Apr/13 ] | |
|
Does this mean that log rotating is basically extremely dangerous? I'm running 2.2.3 and rotate logs by SIGUSR1. Have I just been lucky not having any crashes yet? | |
| Comment by Paul Hamby [ 05/Apr/13 ] | |
|
We are experiencing this with 2.2.2 | |
| Comment by nosqldb [ 17/Feb/13 ] | |
|
We experienced it on 2.2.0. | |
| Comment by Klébert Hodin [ 21/Jan/13 ] | |
|
We experienced it on 2.2.2. | |
| Comment by Oded Maimon [ 21/Jan/13 ] | |
|
anyone know what versions are affected by this bug? |