[SERVER-11087] mongod/mongos fatally asserts when rotating logs and log isn't in original location Created: 08/Oct/13  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Admin, Logging, Security, Stability
Affects Version/s: 2.4.6
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Noah Hoffman Assignee: Backlog - Security Team
Resolution: Unresolved Votes: 6
Labels: platforms-re-triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Running mongo 2.4.6 across the deployment

Shard Nodes (mongod):
Based on base AMI: amzn-ami-minimal-pv-2013.03.0.x86_64-ebs (ami-febe2ace)
Amazon Linux AMI x86_64 EBS
Instance Type: m2.4xlarge
Linux version 3.4.37-40.44.amzn1.x86_64 (mockbuild@gobi-build-31005) (gcc version 4.6.3 20120306 (Red Hat 4.6.3-2) (GCC) ) #1 SMP Thu Mar 21 01:17:08 UTC 2013

Config Nodes:
Same AMI and Linux info as Shard nodes
Instance Type: m1.large

App Nodes (mongos):
Linux version 2.6.21.7-2.fc8xen (mockbuild@xenbuilder4.fedora.phx.redhat.com) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Fri Feb 15 12:34:28 EST 2008
Instance Type: m1.large


Issue Links:
Duplicate
is duplicated by SERVER-11438 server aborts if original log file is... Closed
is duplicated by SERVER-11719 Mongos crash when log rotating Closed
is duplicated by SERVER-15643 mongod aborts on log file rotation wh... Closed
is duplicated by SERVER-27247 mongod crash after SIGUSR1 signal, if... Closed
is duplicated by SERVER-34161 mongod crashes after renaming logfile... Closed
Related
is related to SERVER-4905 logRotate should conform to standard ... Closed
Assigned Teams:
Server Security
Sprint: Security 2020-04-20, Security 2020-05-04, Security 2020-06-01, Security 2020-06-15, Security 2020-07-27, Security 2021-01-11, Security 2021-02-22, Security 2021-03-08, Security 2021-03-22, Security 2021-04-05
Participants:

 Description   

While implementing a log rotate script we found that if the script would move the mongo*.log file to a new location BEFORE sending the SIGUSER signal to the daemon triggering the daemon's own rotate functionality, a fatal assertion is thrown

Stack Trace:
Tue Oct  8 07:00:56.966 [Balancer] distributed lock 'balancer/ip-10-0-0-91:27017:1381230304:1804289383' unlocked.
Tue Oct  8 07:01:01.357 ERROR: failed to rename '/var/log/mongo/mongos.log' to '/var/log/mongo/mongos.log.2013-10-08T14-01-01': errno:2 N
o such file or directory
Tue Oct  8 07:01:01.371   Fatal Assertion 16780
0xa897d1 0xa4daf3 0x65ace5 0xad14c9 0x2aaaaacce617 0x2aaaab951c2d
/usr/bin/mongos(_ZN5mongo15printStackTraceERSo+0x21) [0xa897d1]
/usr/bin/mongos(_ZN5mongo13fassertFailedEi+0xa3) [0xa4daf3]
/usr/bin/mongos(_ZN5mongo22signalProcessingThreadEv+0x55) [0x65ace5]
/usr/bin/mongos [0xad14c9]
/lib64/libpthread.so.0 [0x2aaaaacce617]
/lib64/libc.so.6(clone+0x6d) [0x2aaaab951c2d]
Tue Oct  8 07:01:01.373



 Comments   
Comment by Bill Ryder [ 17/Dec/14 ]

Good suggestions thanks Akshay.

I'll use the create and not cp /dev/null to the log file. Much cleaner.

Also I've put an answer here if you want to check it out since I could have made errors and used your name. I can remove it if you like.

https://serverfault.com/questions/540423/mongodb-proper-way-to-rotate-logs/653095

Comment by Akshay Kumar [ 17/Dec/14 ]

That would work. You are essentially doing the same thing in postrotate with the cp. create is IMO cleaner and more robust. The nocompress is not required as compression happens after postrotate, I updated the comment. I just like to have the last rotated file uncompressed on most of my systems for various reasons.

Comment by Bill Ryder [ 17/Dec/14 ]

Unfortunately the packages I'm using -

mongodb-org-server                       2.6.6

do not seem to support the logRotate option.

/usr/bin/mongod --config /etc/mongod.conf
Unrecognized option: systemLog.logRotate

Assuming I've set it up correctly (which I might not have of course):

systemLog:
  destination: file
  path: "/var/log/mongodb/mongodb.log"
  quiet: true
  logAppend: true
  logRotate: "reopen"
 

This logrotate and doesn't crash the server without that optoin - but there is a small window where a few log entries may be lost.

/var/log/mongodb/mongodb.log {
    weekly
    missingok
    rotate 4
    compress
    notifempty
    nocreate
    postrotate
	cp /dev/null /var/log/mongodb/mongodb.log
        /usr/bin/pkill -USR1 mongod
        rm /var/log/mongodb/mongodb.log.????-??-??T??-??-??
    endscript
}

If I really wanted to keep the logs perfectly I'd switch to syslog logging.

Comment by Akshay Kumar [ 17/Dec/14 ]

Bill as Andy suggested you might want to look at SERVER-4905. This was fixed by adding a YAML directive that changes mongo's behavior on SIGUSR1 to just close and reopen the filhandle as logrotate expects. It was backported to 2.6 IIRC.

If you have an older version take a look at the logrotate script on SERVER-14053 to get logrotate to behave correctly on earlier versions. (see below).

The key here is to use the create directive to logrotate, without that you'll trigger a crash.

So in summary, logrotate file below will do the following:

  • move the log file, mongod will keep writing to the moved file
  • create an empty mongod.log file with correct permissions and ownership
  • send mongo a SIGUSR1, this cause mongo to rotate the log file, now mongo just moves mongod.log that logrotate just created, which is 0 bytes and creates a new mongod.log and starts writing to it
  • finally in postrotate we delete that empty file

/var/log/mongodb/mongod.log {
  daily
  size 1G
  rotate 10
  missingok
  compress
  delaycompress
  notifempty
  create 640 mongod mongod
  sharedscripts
  postrotate
      /bin/kill -SIGUSR1 `cat /var/run/mongodb/mongod.pid 2>/dev/null` >/dev/null 2>&1
      find /var/log/mongodb -type f -size 0 -regextype posix-awk -regex "^\/var\/log\/mongodb\/mongod\.log\.[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}-[0-9]{2}-[0-9]{2}$" -execdir rm {} \; >/dev/null 2>&1
  endscript
}

Comment by Bill Ryder [ 17/Dec/14 ]

Amazingly many answers on the net describing how to use logrotate will break mongod with this bug. They all suggest

    postrotate
            /usr/bin/kill -USR1 $(cat /var/run/mongod.pid)
            rm -f /var/log/mongo/mongod.log.[0-9][0-9][0-9][0-9]-*
    endscript

or variations on this theme - they all seem to send the signal after logrotate has moved the log file out of the way so mongod will crash.

Comment by Laurent Glayal [ 02/Dec/14 ]

Also affect 2.6.x ( SERVER-15643), server should not crash on unexisting log file.

Comment by Heikki Rauhala [ 31/Oct/13 ]

Part of the problem is that the 10gen packaged version for Ubuntu does not contain any logrotate scripts, and the most popular unofficial documentation, like http://stackoverflow.com/a/8396266/61175 propose logrotate scripts that end up crashing the server.

Comment by Heikki Rauhala [ 31/Oct/13 ]

Affects mongodb version 2.4.7 (0161738abf06c1f067b56a465b706efd6f4bf2aa)

Comment by Noah Hoffman [ 21/Oct/13 ]

@Andy - sorry for the delay in responding. Yes, you've summarized it correctly, it should abort in this case, but not throw a fatal error. The other issue you pointed me to, 4905, is also a nice to have and I will vote for it but it wasn't the primary motivation for logging this issue. Thanks!

Comment by Andy Schwerin [ 09/Oct/13 ]

The server intentionally aborts when log rotation doesn't go as planned, so as not to have periods of unlogged operation. The scenario you describe in this bug probably shouldn't lead to a fatal error, however.

Out of curiosity, are you looking for the behavior requested in SERVER-4905? If so, please do vote for it, for tracking purposes.

Generated at Thu Feb 08 03:24:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.