[SERVER-434] bad recv() len while fresh slave initial cloning Created: 22/Nov/09  Updated: 15/Jan/10  Resolved: 22/Nov/09

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 1.0.1, 1.1.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Erwan Arzur Assignee: Eliot Horowitz (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux amazonaws.com 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux


Participants:

 Description   

I am trying to build a fresh slave from an existing database hosted by a master running 1.0.1. Wanting to upgrade, i've tried using 1.1.3 --slave from scratch. After cloning about 57GB, i get the following messages in the log file, and the slave restart cloning from 0. I've tried downgrading to 1.0.1 and get the same results.

I suspect some kind of data corruption on the master. Is there any way I could make the current master check the database's integrity ? Do you have some fsck-like tools ?

I have 1 running slave that seem to work fine, and should have the same data, but i'm not sure i can trust it ...

Any advice/guidance ?

Here are the messages i get in the log file (after about 5 hours of cloning ...)

Sun Nov 22 12:23:05 bad recv() len: 16760190
Sun Nov 22 12:23:05 Assertion: dbclient error communicating with server
Sun Nov 22 12:23:05 replMain AssertionException dbclient error communicating with server
Sun Nov 22 12:23:08 replMain: sleep 3 before next pass
Sun Nov 22 12:23:08 pull: main@mongodb.silentale.net
Sun Nov 22 12:23:13 An earlier initial clone of 'veronica_production' did not complete, now resyncing.
Sun Nov 22 12:23:13 resync: dropping database veronica_production
Sun Nov 22 12:23:20 resync: cloning database veronica_production
Sun Nov 22 12:23:20 allocating new datafile /mongo/data/veronica_production.ns, filling with zeroes...
Sun Nov 22 12:23:21 done allocating datafile /mongo/data/veronica_production.ns, size: 16777216, took 0.033 secs
Sun Nov 22 12:23:21 allocating new datafile /mongo/data/veronica_production.0, filling with zeroes...
Sun Nov 22 12:23:21 done allocating datafile /mongo/data/veronica_production.0, size: 67108864, took 0.088 secs
Sun Nov 22 12:23:21 building new index on Sun Nov 22 12:23:21 allocating new datafile /mongo/data/veronica_production.1, filling with zeroes...

{ _id: ObjId(000000000000000000000000) }

for veronica_production.messages_dIMgDw5kyqljzXeJe6ak2s...done for 0 records
Sun Nov 22 12:23:21 building new index on

{ _id: ObjId(000000000000000000000000) }

for veronica_production.messages_biXwEm5kSqljzXeJe6ak2s...done for 0 records

And in the master's log at that time:

Sun Nov 22 12:23:05 killCursors: found 1 of 1
Sun Nov 22 12:23:05 killcursors 30ms
Sun Nov 22 12:23:05 MessagingPort recv() error "Connection reset by peer" (104) 10.241.79.207:1434
Sun Nov 22 12:23:05 end connection 10.241.79.207:1434
Sun Nov 22 12:23:05 killCursors: found 0 of 1
Sun Nov 22 12:23:05 killcursors 18ms
Sun Nov 22 12:23:05 end connection 10.241.79.207:1178



 Comments   
Comment by Eliot Horowitz (Inactive) [ 10/Dec/09 ]

Correct - it wouldn't have anything to do with utf-8.
It's most likely a large object.

Comment by Nicolas Fouché [ 10/Dec/09 ]

One thing is sure, it is not related to documents containing non-utf8 characters. See http://groups.google.com/group/mongodb-user/browse_thread/thread/4e3aa74f1a6fe5fc/c944fa1d0953adce#c944fa1d0953adce

Comment by Eliot Horowitz (Inactive) [ 22/Nov/09 ]

can you run validate() or check size of the obejcts.
i thin it must be a large object.
also can run with --objcheck

Generated at Thu Feb 08 02:54:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.