[SERVER-7707] MongoDB 2.2.1/Linux, 32-bit crashing upon log rotation Created: 19/Nov/12  Updated: 22/Aug/13  Resolved: 07/Apr/13

Status: Closed
Project: Core Server
Component/s: Build
Affects Version/s: 2.2.1
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Andreas Jung Assignee: Scott Hernandez (Inactive)
Resolution: Duplicate Votes: 0
Labels: nh-240
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux gasmon10 2.6.38-gentoo-r6 #1 SMP Fri Feb 17 21:57:45 CET 2012 i686 Intel(R) Xeon(R) CPU X5650 @ 2.67GHz GenuineIntel GNU/Linux


Issue Links:
Duplicate
duplicates SERVER-4739 Race condition in log rotation (was: ... Closed
Related
related to SERVER-6896 Server crashes if logRotate is issued... Closed
Operating System: ALL
Participants:

 Description   

After switching from MongoDB 2.0 to 2.2.1 our mongod crashes every night upon log rotation:

Sat Nov 17 00:02:44 [conn502] end connection 127.0.0.1:55029 (4 connections now open)
Sat Nov 17 00:02:44 [conn503] end connection 127.0.0.1:55030 (3 connections now open)
Sat Nov 17 00:07:01 [initandlisten] ERROR: Failed to rename /srv/gasmon/mongodb/mongodb.log to /srv/gasmon/mongodb/mongodb.log.2012-11-16T23-07-01  Fatal Assertion 16176
0x875225a 0x87291a5 0x8712078 0x82a513d 0xb7724400 0xb7724424 0xb75117c1 0x8739a1f 0x873f590 0x817166f 0x8174499 0x8174c45 0x817cf7e 0x817e6a2 0xb7460cc6 0x816f1d1 
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo15printStackTraceERSo+0x2a) [0x875225a]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo10logContextEPKc+0xa5) [0x87291a5]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo13fassertFailedEi+0xa8) [0x8712078]
 /srv/gasmon/gasmon-prod/bin/mongod() [0x82a513d]
 [0xb7724400]
 [0xb7724424]
 /lib/libc.so.6(__select+0x61) [0xb75117c1]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo8Listener13initAndListenEv+0x23f) [0x8739a1f]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo17PortMessageServer3runEv+0x20) [0x873f590]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo6listenEi+0x1af) [0x817166f]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo14_initAndListenEi+0x669) [0x8174499]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo13initAndListenEi+0x25) [0x8174c45]
 /srv/gasmon/gasmon-prod/bin/mongod() [0x817cf7e]
 /srv/gasmon/gasmon-prod/bin/mongod(main+0x32) [0x817e6a2]
 /lib/libc.so.6(__libc_start_main+0xe6) [0xb7460cc6]
 /srv/gasmon/gasmon-prod/bin/mongod() [0x816f1d1]
Sat Nov 17 00:07:01 [initandlisten] 
 
***aborting after fassert() failure
 
 
Sat Nov 17 00:07:01 Got signal: 6 (Aborted).
 
Sat Nov 17 00:07:01 Backtrace:
0x875225a 0x816ffea 0xb7724400 0xb7724424 0xb74746a1 0xb7475de2 0x87120b1 0x82a513d 0xb7724400 0x873f590 0x817166f 0x8174499 0x8174c45 0x817cf7e 0x817e6a2 0xb7460cc6 0x816f1d1 
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo15printStackTraceERSo+0x2a) [0x875225a]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo10abruptQuitEi+0x3ba) [0x816ffea]
 [0xb7724400]
 [0xb7724424]
 /lib/libc.so.6(gsignal+0x51) [0xb74746a1]
 /lib/libc.so.6(abort+0x182) [0xb7475de2]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo13fassertFailedEi+0xe1) [0x87120b1]
 /srv/gasmon/gasmon-prod/bin/mongod() [0x82a513d]
 [0xb7724400]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo17PortMessageServer3runEv+0x20) [0x873f590]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo6listenEi+0x1af) [0x817166f]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo14_initAndListenEi+0x669) [0x8174499]
 /srv/gasmon/gasmon-prod/bin/mongod(_ZN5mongo13initAndListenEi+0x25) [0x8174c45]
 /srv/gasmon/gasmon-prod/bin/mongod() [0x817cf7e]
 /srv/gasmon/gasmon-prod/bin/mongod(main+0x32) [0x817e6a2]
 /lib/libc.so.6(__libc_start_main+0xe6) [0xb7460cc6]
 /srv/gasmon/gasmon-prod/bin/mongod() [0x816f1d1]



 Comments   
Comment by Scott Hernandez (Inactive) [ 07/Apr/13 ]

This will be fixed by SERVER-4739.

There is also a working plan to use USR2/<signal> to provide more useful releasing of the file (which will create a new one, if the orig is not there anymore after) so it can be used by the *unix logrotate tool – see SERVER-4905.

Comment by Raviv Pavel [ 13/Jan/13 ]

Having the same issue. mongod crashed while log rotating. this is what I found in the new log file:

Sun Jan 13 03:19:03 ERROR: Failed to rename /var/log/mongodb/mongod.log to /var/log/mongodb/mongod.log.2013-01-13T08-19-03  Fatal Assertion 16176
Sun Jan 13 03:19:03 [initandlisten] select() signal caught, continuing
0xaffd31 0xac5323 0x681abc 0x38bd032900 0x38bd40f245 0x558478 0xb45ba9 0x38bd4077f1 0x38bd0e570d
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xaffd31]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xac5323]
 /usr/bin/mongod() [0x681abc]
 /lib64/libc.so.6() [0x38bd032900]
 /lib64/libpthread.so.0(sigwait+0xe5) [0x38bd40f245]
 /usr/bin/mongod(_ZN5mongo15interruptThreadEv+0x18) [0x558478]
 /usr/bin/mongod() [0xb45ba9]
 /lib64/libpthread.so.0() [0x38bd4077f1]
 /lib64/libc.so.6(clone+0x6d) [0x38bd0e570d]
Sun Jan 13 03:19:03
 
***aborting after fassert() failure
 
 
Sun Jan 13 03:19:03 Got signal: 6 (Aborted).
 
Sun Jan 13 03:19:03 Backtrace:
0xaffd31 0x558bb9 0x38bd032900 0x38bd032885 0x38bd034065 0xac535e 0x681abc 0x38bd032900 0x38bd40f245 0x558478 0xb45ba9 0x38bd4077f1 0x38bd0e570d
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xaffd31]
 /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x558bb9]
 /lib64/libc.so.6() [0x38bd032900]
 /lib64/libc.so.6(gsignal+0x35) [0x38bd032885]
 /lib64/libc.so.6(abort+0x175) [0x38bd034065]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xac535e]
 /usr/bin/mongod() [0x681abc]
 /lib64/libc.so.6() [0x38bd032900]
 /lib64/libpthread.so.0(sigwait+0xe5) [0x38bd40f245]
 /usr/bin/mongod(_ZN5mongo15interruptThreadEv+0x18) [0x558478]
 /usr/bin/mongod() [0xb45ba9]
 /lib64/libpthread.so.0() [0x38bd4077f1]
 /lib64/libc.so.6(clone+0x6d) [0x38bd0e570d]
 

This is a 64bit Centos machine running v2.2.2 and log rotation has been working fine until this incident.

Comment by Andrew R [ 03/Jan/13 ]

Andreas, in order to work around the fact that mongo wants a logfile to rename, one idea is to use the "create" option. This will make logrotate create an empty file using the original log name after rotating the original log. The postrotate will run, mongo will rename the empty file. You can add a line to the postrotate to then delete the empty file which will now look like mongod.log.2011-11-25T00-30-00. I just use shell pattern matching to delete those files.

A rough cut might look something like this:

/var/log/mongodb/mongod.log {
  daily
  rotate 15
  compress
  nocopy
  missingok
  notifempty
  create
  postrotate
    /bin/kill -SIGUSR1 `cat /var/log/mongodb/mongod.lock 2> /dev/null` 2> /dev/null
    /bin/bash -O extglob -c '/bin/rm -f /var/log/mongodb/mongod.log.+([0-9\-])T+([0-9\-])'
    true
  endscript
}

There might be a better way to do this that involves having logrotate manipulate the renamed mongo log files directly, but this is what I'm using for now.

Comment by Andreas Jung [ 03/Jan/13 ]

How should the logrotate configuration look like?

Comment by Andrew R [ 03/Jan/13 ]

Just wanted to comment that the manner of log rotation describe by Andreas worked in Mongo 2.0. We did it that way, but upon upgrading to 2.2 ran into the same assert failure. My logrotate config is virtually identical to Andreas' config. I will update my logrotate script to fix the problem, but Denis is right, this is a typical *nix pattern and probably shouldn't assert.

Comment by Andreas Jung [ 21/Dec/12 ]

gasmon@gasmon10 ~ $ ls -la /srv/gasmon/mongodb/
total 525176
drwxr-xr-x 5 gasmon service 4096 Dec 21 00:07 .
drwxr-xr-x 16 gasmon service 4096 Nov 19 06:45 ..
drwxr-xr-x 2 gasmon service 4096 Nov 18 10:56 journal
rw-rr- 1 gasmon service 53328 Dec 21 09:05 mongodb.log
rw-rr- 1 gasmon service 15931 Dec 12 00:06 mongodb.log-20121212.gz
rw-rr- 1 gasmon service 15091 Dec 13 00:06 mongodb.log-20121213.gz
rw-rr- 1 gasmon service 15621 Dec 14 00:06 mongodb.log-20121214.gz
rw-rr- 1 gasmon service 16161 Dec 15 00:06 mongodb.log-20121215.gz
rw-rr- 1 gasmon service 15522 Dec 16 00:06 mongodb.log-20121216.gz
rw-rr- 1 gasmon service 15951 Dec 17 00:06 mongodb.log-20121217.gz
rw-rr- 1 gasmon service 16540 Dec 18 00:06 mongodb.log-20121218.gz
rw-rr- 1 gasmon service 16294 Dec 19 00:06 mongodb.log-20121219.gz
rw-rr- 1 gasmon service 16994 Dec 20 00:06 mongodb.log-20121220.gz
rw-rr- 1 gasmon service 16086 Dec 21 00:06 mongodb.log-20121221.gz
-rwxr-xr-x 1 gasmon service 6 Nov 17 11:08 mongod.lock
drwxr-xr-x 2 gasmon service 4096 Jan 30 2012 monitoring
rw------ 1 gasmon service 16777216 Dec 21 09:05 monitoring.0
rw------ 1 gasmon service 33554432 Dec 4 10:12 monitoring.1
rw------ 1 gasmon service 67108864 Dec 20 11:06 monitoring.2
rw------ 1 gasmon service 134217728 Dec 21 09:05 monitoring.3
rw------ 1 gasmon service 268435456 Aug 7 06:40 monitoring.4
rw------ 1 gasmon service 16777216 Dec 21 09:05 monitoring.ns
rw-rr- 1 gasmon service 100917 Nov 19 06:44 out
drwxr-xr-x 2 gasmon service 4096 Dec 12 13:47 _tmp

Comment by Denis Shashkov [ 21/Dec/12 ]

Indeed, crashing is a hard way to tell about missing log file.

I think, most reporters rotating log files as in the *x systems:

  1. move log file (while deamon keeps file descriptor)
  2. cause daemon process to reopen log (by sending signal, for example)
  3. daemon closes file descriptor and (re-)open new log file with the same name.

With mongod this not work.
Mongod first tries rename log file by self and fails if file is not exists.
This rotating sequence is desribed at http://docs.mongodb.org/manual/tutorial/rotate-log-files/

Comment by Gustavo Campo [ 03/Dec/12 ]

Hi all, I'd like to add a little bit on this since we had the same error with 64 bits and Ubuntu.
We had scheduled twice the log rotation for the same instant (fixed, our bad ), so, when the second log rotation occurs the logfile doesn't exists since the firstone hasn't finished yet. This causes the mongod process to be terminated. User has full permissions over the log path, since it is on its home.
A simple way to reproduce this is:
1) Start a mongod with --logpath
2) Remove the logfile from the mongod
3) Issue the logRotate command on that mongod
4) The mongod should've crashed

I think crashing the process is a bit hard, perhaps creating an empty file with details on what just happened should work fine.

Thanks, regards,
Gustavo

Comment by Eliot Horowitz (Inactive) [ 20/Nov/12 ]

Can you send the permissions?
ls -la /srv/gasmon/mongodb/

Comment by Andreas Jung [ 19/Nov/12 ]

/srv/gasmon/mongodb/mongodb.log {
daily
rotate 10
dateext
compress
notifempty
missingok
sharedscripts

postrotate
/bin/kill -SIGUSR1 `cat $HOME/mongodb/mongod.lock 2> /dev/null` 2> /dev/null || true
endscript
}

Comment by Daniel Pasette (Inactive) [ 19/Nov/12 ]

Which method are you using to trigger the log rotate?

Comment by Eliot Horowitz (Inactive) [ 19/Nov/12 ]

Can you send the permissions on /srv/gasmon/mongodb/?

Generated at Thu Feb 08 03:15:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.