[SERVER-12137] Socket recv() timeout problem Created: 17/Dec/13 Updated: 11/Jul/16 Resolved: 20/Dec/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.4.1 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | dejun teng | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
red hat linux |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
we set up a 30 nodes mongodb system on our cluster with two physical nodes, node40 and node41, 15 each. I tried to up load as much as 500 GB data into this database. however, one node shutdown with this error report: Sun Nov 24 14:58:15.910 [conn9] command admin.$cmd command: { writebacklisten: ObjectId('528f8628c868398bb45de20e') } ntoreturn:1 keyUpdates:0 reslen:262523 50686ms node40.clus.cci.emory.edu:27020:{} , max: { files_id: ObjectId('529259620cf23a90f681d1ec') }, shard: "dicom2" }, o2: { _id: "dicomdb.fs.chunks-files_id_ObjectId('5292592d0cf23a90f681c5e0')" }}, { op: "u", b: false, ns: "config.chunks", o: { _id: "dicomdb.fs.chunks-files_id_ObjectId('529259620cf23a90f681d1ec')", lastmod: Timestamp 12000|1, lastmodEpoch: ObjectId('529258b4c868398bb45e3755'), ns: "dicomdb.fs.chunks", min: { files_id: ObjectId('529259620cf23a90f681d1ec') }, max: { files_id: ObjectId('5292597d0cf23a90f681de3f') }, shard: "dicom1" }, o2: { _id: "dicomdb.fs.chunks-files_id_ObjectId('529259620cf23a90f681d1ec')" }} ], preCondition: [ { ns: "config.chunks", q: { query: { ns: "dicomdb.fs.chunks" }, orderby: { lastmod: -1 }}, res: { lastmod: Timestamp 11000|3 }} ] } for command :{ $err: "SyncClusterConnection::findOne prepare failed: 10276 DBClientBase::findN: transport error: node40.clus.cci.emory.edu:27020 ns: admin.$cmd query: { fsy...", code: 13104 }Sun Nov 24 14:58:35.980 [conn3642] waiting till out of critical section At the beginning, all data is inserted into the primary node, and then balancer try to move chunks to other nodes. It can move chunks for a short period of time like 10 minute and then always raise this error. I also attached the log file, you can locate the error part by searching “failed” |
| Comments |
| Comment by dejun teng [ 17/Dec/13 ] |
|
Thank you! I never thought its a software problem. |
| Comment by Eliot Horowitz (Inactive) [ 17/Dec/13 ] |
|
You are probably running into |