[SERVER-3193] mongod hangs when rotate the log Created: 04/Jun/11  Updated: 03/Sep/11  Resolved: 03/Sep/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.8.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: davyzhang Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: log, rotate
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

centos 5.5


Operating System: ALL
Participants:

 Description   

I have a mongod master process with 7 slaves process on same machine, everything goes fine before I started using killall -SIGUSR1 mongod, every 2 days one of my server's master mongod will stop working. but slaves are fine, I can connect to the listening port using nc. But I can't connect to the mongod using mongo shell ,it hangs. And I can not kill the mongod master process using kill -2 <pid>, All I have to do is using kill -9 and repair my db, It is costy and unpractical,

I checked the log everything is fine, but the new log is empty, It seems mongod didn't rotate the log successfully. It caused the filesystem deadlock??



 Comments   
Comment by Eliot Horowitz (Inactive) [ 03/Sep/11 ]

See SERVER-3339

Comment by davyzhang [ 04/Jun/11 ]

mongod --master --fork --logpath /opt/zoubiao_log/zoubiao/mongo/master.log --logappend --port 27417 --dbpath /data/dt_master_0/

mongod --slave --autoresync --fork --logpath /opt/zoubiao_log/zoubiao/mongo/slave_27418.log --logappend --source localhost:27417 --port 27418 --dbpath /data/dt_slave_27418/

log is big I can only paste the last lines of them

Sat Jun 4 15:59:58 [initandlisten] connection accepted from 10.168.0.85:45401 #1156777
Sat Jun 4 15:59:58 [conn1156691] end connection 10.168.0.86:48093
Sat Jun 4 15:59:58 [conn1156697] end connection 10.168.0.86:48107
Sat Jun 4 15:59:58 [initandlisten] connection accepted from 10.168.0.86:48126 #1156778
Sat Jun 4 15:59:58 [conn1156705] end connection 10.168.0.85:45394
Sat Jun 4 15:59:58 [conn1156703] end connection 10.168.0.85:45392
Sat Jun 4 15:59:58 [conn1156707] end connection 10.168.0.85:45401
Sat Jun 4 15:59:59 [initandlisten] connection accepted from 10.168.0.85:45420 #1156779
Sat Jun 4 15:59:59 [initandlisten] connection accepted from 10.168.0.85:45425 #1156780
Sat Jun 4 15:59:59 [conn1156709] end connection 10.168.0.85:45420
Sat Jun 4 15:59:59 [conn1156710] end connection 10.168.0.85:45425
Sat Jun 4 15:59:59 [initandlisten] connection accepted from 10.168.0.86:48135 #1156781
Sat Jun 4 15:59:59 [conn1156689] end connection 10.168.0.86:48092
Sat Jun 4 15:59:59 [conn1156702] end connection 10.168.0.86:48120
Sat Jun 4 15:59:59 [conn1156696] end connection 10.168.0.86:48106
Sat Jun 4 15:59:59 [initandlisten] connection accepted from 10.168.0.86:48137 #1156782
Sat Jun 4 15:59:59 [initandlisten] connection accepted from 10.168.0.85:45443 #1156783
Sat Jun 4 15:59:59 [initandlisten] connection accepted from 10.168.0.86:48138 #1156784
Sat Jun 4 15:59:59 [initandlisten] connection accepted from 10.168.0.86:48141 #1156785
Sat Jun 4 15:59:59 [conn1156701] end connection 10.168.0.86:48112
Sat Jun 4 15:59:59 [conn1156708] end connection 10.168.0.86:48126
Sat Jun 4 15:59:59 [conn1156699] end connection 10.168.0.86:48110
Sat Jun 4 15:59:59 [conn1156706] end connection 10.168.0.86:48124
Sat Jun 4 15:59:59 [conn1156713] end connection 10.168.0.85:45443
Sat Jun 4 15:59:59 [initandlisten] connection accepted from 10.168.0.114:22402 #1156786
Sat Jun 4 15:59:59 [conn1156716] end connection 10.168.0.114:22402
Sat Jun 4 16:00:00 [initandlisten] connection accepted from 10.168.0.86:48149 #1156787
Sat Jun 4 16:00:00 [initandlisten] connection accepted from 10.168.0.85:45456 #1156788
Sat Jun 4 16:00:00 [initandlisten] connection accepted from 10.168.0.86:48151 #1156789
Sat Jun 4 16:00:00 [initandlisten] connection accepted from 10.168.0.85:45459 #1156790
Sat Jun 4 16:00:00 [initandlisten] connection accepted from 10.168.0.85:45463 #1156791
Sat Jun 4 16:00:00 [initandlisten] connection accepted from 10.168.0.86:48153 #1156792
Sat Jun 4 16:00:00 [initandlisten] connection accepted from 10.168.0.86:48155 #1156793
Sat Jun 4 16:00:00 [conn1156712] end connection 10.168.0.86:48137
Sat Jun 4 16:00:00 [conn1156718] end connection 10.168.0.85:45456
Sat Jun 4 16:00:00 [conn1156720] end connection 10.168.0.85:45459
Sat Jun 4 16:00:00 [conn1156721] end connection 10.168.0.85:45463
Sat Jun 4 16:00:01 [initandlisten] connection accepted from 10.168.0.85:45488 #1156794
Sat Jun 4 16:00:01 [initandlisten] connection accepted from 10.168.0.85:45488 #1156794

after 16:00:01 there's only an empty file named master.log and mongod hangs

this problem never happens before I using log rotate

about 7 slaves on same server ,I wanna make query more fast ,but it looks a wrong idea right now, but thats another topic

Comment by Scott Hernandez (Inactive) [ 04/Jun/11 ]

What are the command line options (config-file) you are using?

Why are you running 7 slaves on the same machine?

Can you provide the logs from the master when is stops working?

Comment by davyzhang [ 04/Jun/11 ]

I am using killall -SIGUSR1 mongod in crontab for every hour,

Generated at Thu Feb 08 03:02:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.