[SERVER-7021] One Shard a Secondary Node can't be restart Created: 12/Sep/12 Updated: 08/Mar/13 Resolved: 02/Jan/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Stability, Storage |
| Affects Version/s: | 2.0.6, 2.2.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Xuguang zhan | Assignee: | Unassigned |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | replicaset | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
PRIMARY> rs.status() , , , , , |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Operating System: | Linux | ||||
| Participants: | |||||
| Description |
|
in this Shard ,one Secondary Node can't be restart, check the exception log I have Test it on the 2.2.0 and 2.0.6, both not workable. Wed Sep 12 07:34:09 [initandlisten] MongoDB starting : pid=31578 port=27018 dbpath=/root/mongodata/db 64-bit host=wdsq1wco0010 Wed Sep 12 07:34:09 [initandlisten] journal dir=/root/mongodata/db/journal [analysis] we find the secondary lost the performance.ns, but check the another PriNode it also no eixt collection,but it still can restart Check attachment Pri_data Info about the set , , , , , |
| Comments |
| Comment by Xuguang zhan [ 14/Sep/12 ] |
|
when i used repair:it cause the server can't be restart Fri Sep 14 02:28:18 [initandlisten] MongoDB starting : pid=2499 port=27018 dbpath=/root/mongodata/db 64-bit host=wdsq1wco0010 Fri Sep 14 02:28:18 [initandlisten] journal dir=/root/mongodata/db/journal Fri Sep 14 02:28:19 [FileAllocator] allocating new datafile /root/mongodata/db/$tmp_repairDatabase_0/performance.1, filling with zeroes... } , terminating |
| Comment by Kevin Matulef [ 13/Sep/12 ] |
|
The node didn't start before, but now it does? Can you try restarting it with the --repair option? |
| Comment by Xuguang zhan [ 13/Sep/12 ] |
|
yes, the Disk have been used up. check the new error on the restart node , ], any idea? |
| Comment by Kevin Matulef [ 13/Sep/12 ] |
|
Sorry, my screen cut off the development environment stuff (usually people just post OS version + mongoDB version there). I cut and pasted the rs.status() info in the description. A couple of your other secondaries have error messages "Can't take a write lock while out of disk space." Are these machines simply out of space? |
| Comment by Xuguang zhan [ 13/Sep/12 ] |
|
Pls check my envirement ,it have show we have another secondary nodes also, I have checked another bug ,you have mentioned have fixed in 2.1.2 ,so I try this version ,it is not working .pls help to check. same question on this ,why the .ns file diappeared in the PRI node and the bad secondary Node ?. i check another secondary node it have exist Copy: then i copy the data to the secondary Node ,it can be restart. but we can't get the performance data in PRI node how to explain this? may be data lost? i think we should check |
| Comment by Kevin Matulef [ 13/Sep/12 ] |
|
Did this machine experience a hard shutdown? It looks like it's failing when it tries to replay the journal files. I presume the "performance" database is no longer visible on the primary? Odd that the .ns file should just disappear. Did any events lead up to this? Do you have any other secondary nodes in this set? The safest way to recover here would be to copy the datafiles from a working secondary. |