[SERVER-10292] process mongods are usually auto-disconnected, and sometimes we are fail to start it in hand Created: 23/Jul/13  Updated: 10/Dec/14  Resolved: 04/Apr/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.4
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: zhangjianchao Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

operation system:red hat linux-server-6.4-x86_64

three Mongo config server
three shards in every sever to back-up


Attachments: File 1.png     Text File shard logs.txt    
Participants:

 Description   

we started three mongod(correspond to three shard) in every server.afte we insert data about 1000,0000,we check the process mongods whether they are still started yet.then we find someone is stoped, the log is uploaded in the first attachement.
another qusetion:query error,

Mon Jul 22 10:00:30.337 [repl writer worker 1] ERROR: writer worker caught exception: BSONObj size: 1936028718 (0x2E746573) is i
nvalid. Size must be between 0 and 16793600(16MB) First element: : ?type=116 on: { ts: Timestamp 1374316162000|168, h: -21285760
6414574318, v: 2, op: "i", ns: "test.test", o: { _id: "1374316189024ec10f959daaa", C_ID: "1618", TIME: new Date(1374316189024), 
SE: "3", AREA: "a14", CT1: "4", CT2: "3", S_PT: 4, EN_N: "Test system", T_ID: "242", EN_T: "2", MSG: "for test35", EVENT: "test 
event70", S_IP: "1.2.4.138", URL: "URL257" } }

how to deal with them?



 Comments   
Comment by Daniel Pasette (Inactive) [ 24/Jul/13 ]

In order to diagnose how your data became corrupted we would need a good history of what happened. At this point, your only options are to start again or run repair. Once your data is in a consistent state you must monitor your import process carefully, which includes saving log files from any node impacted in a failure.

Comment by zhangjianchao [ 24/Jul/13 ]

we just keep going to insert data, we don't know why our database is corrupted.And we test it many times, process mongods are usually auto-disconnected, we think the reason why exists query error is because the disconnected mongod. In addition, we use it in virtual machines.

Comment by Daniel Pasette (Inactive) [ 23/Jul/13 ]

the log file you uploaded indicates that your journal files are somehow corrupted. The query error and replication error you uploaded also indicate that your database is corrupted.

Do you know the steps you've taken to get the cluster to this state?

Mon Jul 22 17:02:50.028 [initandlisten] recover skipping application of section seq:163744599 < lsn:165824241
Mon Jul 22 17:02:50.029 [initandlisten] recover skipping application of section seq:163803982 < lsn:165824241
Mon Jul 22 17:02:50.031 [initandlisten] recover skipping application of section seq:163863362 < lsn:165824241
Mon Jul 22 17:02:50.033 [initandlisten] recover skipping application of section seq:163922760 < lsn:165824241
Mon Jul 22 17:02:50.035 [initandlisten] recover skipping application of section seq:163982152 < lsn:165824241
Mon Jul 22 17:02:50.037 [initandlisten] recover skipping application of section seq:164041542 < lsn:165824241
Mon Jul 22 17:02:50.039 [initandlisten] recover error: abrupt end to file /mongodb/scheme2/sh0/data/journal/j._3, yet it isn't t
he last journal file
Mon Jul 22 17:02:50.039 [initandlisten] dbexception during recovery: 13535 recover abrupt journal file end
Mon Jul 22 17:02:50.039 [initandlisten] exception in initAndListen: 13535 recover abrupt journal file end, terminating

Generated at Thu Feb 08 03:22:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.