Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-434

bad recv() len while fresh slave initial cloning

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 1.0.1, 1.1.3
    • Component/s: Replication
    • Labels:
      None
    • Environment:
      Linux amazonaws.com 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

      I am trying to build a fresh slave from an existing database hosted by a master running 1.0.1. Wanting to upgrade, i've tried using 1.1.3 --slave from scratch. After cloning about 57GB, i get the following messages in the log file, and the slave restart cloning from 0. I've tried downgrading to 1.0.1 and get the same results.

      I suspect some kind of data corruption on the master. Is there any way I could make the current master check the database's integrity ? Do you have some fsck-like tools ?

      I have 1 running slave that seem to work fine, and should have the same data, but i'm not sure i can trust it ...

      Any advice/guidance ?

      Here are the messages i get in the log file (after about 5 hours of cloning ...)

      Sun Nov 22 12:23:05 bad recv() len: 16760190
      Sun Nov 22 12:23:05 Assertion: dbclient error communicating with server
      Sun Nov 22 12:23:05 replMain AssertionException dbclient error communicating with server
      Sun Nov 22 12:23:08 replMain: sleep 3 before next pass
      Sun Nov 22 12:23:08 pull: main@mongodb.silentale.net
      Sun Nov 22 12:23:13 An earlier initial clone of 'veronica_production' did not complete, now resyncing.
      Sun Nov 22 12:23:13 resync: dropping database veronica_production
      Sun Nov 22 12:23:20 resync: cloning database veronica_production
      Sun Nov 22 12:23:20 allocating new datafile /mongo/data/veronica_production.ns, filling with zeroes...
      Sun Nov 22 12:23:21 done allocating datafile /mongo/data/veronica_production.ns, size: 16777216, took 0.033 secs
      Sun Nov 22 12:23:21 allocating new datafile /mongo/data/veronica_production.0, filling with zeroes...
      Sun Nov 22 12:23:21 done allocating datafile /mongo/data/veronica_production.0, size: 67108864, took 0.088 secs
      Sun Nov 22 12:23:21 building new index on Sun Nov 22 12:23:21 allocating new datafile /mongo/data/veronica_production.1, filling with zeroes...

      { _id: ObjId(000000000000000000000000) }

      for veronica_production.messages_dIMgDw5kyqljzXeJe6ak2s...done for 0 records
      Sun Nov 22 12:23:21 building new index on

      { _id: ObjId(000000000000000000000000) }

      for veronica_production.messages_biXwEm5kSqljzXeJe6ak2s...done for 0 records

      And in the master's log at that time:

      Sun Nov 22 12:23:05 killCursors: found 1 of 1
      Sun Nov 22 12:23:05 killcursors 30ms
      Sun Nov 22 12:23:05 MessagingPort recv() error "Connection reset by peer" (104) 10.241.79.207:1434
      Sun Nov 22 12:23:05 end connection 10.241.79.207:1434
      Sun Nov 22 12:23:05 killCursors: found 0 of 1
      Sun Nov 22 12:23:05 killcursors 18ms
      Sun Nov 22 12:23:05 end connection 10.241.79.207:1178

            Assignee:
            eliot Eliot Horowitz (Inactive)
            Reporter:
            erwan Erwan Arzur
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: