Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9057

Replica set crashes during sync

    • Linux

      I tried to spinning up a new replica set from a previously unreplicated database. I upgraded to Mongo 2.4.0 on both machines. I killed the local.* files on both machines, restarted them with the replSet option, and then ran rs.initiate() and rs.add() on the second machine.

      Sometime during the synching phase, both servers crash. The secondary one reported:

      Wed Mar 20 22:07:27.102 [rsSync] clone sensordb.gas_readings_by_hour 121215
      Wed Mar 20 22:07:42.825 [conn440] end connection 10.10.0.2:36829 (0 connections now open)
      Wed Mar 20 22:07:42.826 [initandlisten] connection accepted from 10.10.0.2:37534 #441 (2 connections now open)
      Wed Mar 20 22:07:53.884 [rsHealthPoll] replset info owl:27017 thinks that we are down
      Wed Mar 20 22:08:12.296 [initandlisten] connection accepted from 10.10.0.2:40000 #442 (2 connections now open)
      Wed Mar 20 22:08:12.296 [initandlisten] connection accepted from 10.10.0.2:54924 #443 (3 connections now open)
      Wed Mar 20 22:07:54.209 [rsSync] Socket flush send() errno:9 Bad file descriptor 10.10.0.2:27017
      Wed Mar 20 22:08:12.296 [rsHealthPoll] replSet member owl:27017 is now in state SECONDARY
      Wed Mar 20 22:08:12.296 [rsSync]   caught exception (socket exception [SEND_ERROR] for 10.10.0.2:27017) in destructor (~PiggyBackData)
      Wed Mar 20 22:08:12.296 [conn441] end connection 10.10.0.2:37534 (2 connections now open)
      Wed Mar 20 22:08:12.296 [rsSync] replSet initial sync exception: 16465 recv failed while exhausting cursor 0 attempts remaining
      Wed Mar 20 22:08:12.296 [conn442] end connection 10.10.0.2:40000 (1 connection now open)
      Wed Mar 20 22:08:15.201 [DataFileSync] flushing mmaps took 36971ms  for 67 files
      Wed Mar 20 22:08:18.305 [conn443] replSet info voting yea for owl:27017 (0)
      Wed Mar 20 22:08:20.305 [rsHealthPoll] replSet member owl:27017 is now in state PRIMARY
      Wed Mar 20 22:08:38.313 [conn443] end connection 10.10.0.2:54924 (0 connections now open)
      Wed Mar 20 22:08:38.313 [initandlisten] connection accepted from 10.10.0.2:53202 #444 (1 connection now open)
      Wed Mar 20 22:08:42.296 [rsSync]   Fatal Assertion 16233
      0xdcae01 0xd8ab83 0xc0230f 0xc1df91 0xc1edad 0xc1f07c 0xe13709 0x7f7f5caa8e9a 0x7f7f5bdbbcbd
       /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcae01]
       /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xd8ab83]
       /usr/bin/mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x6f) [0xc0230f]
       /usr/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x71) [0xc1df91]
       /usr/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x2d) [0xc1edad]
       /usr/bin/mongod(_ZN5mongo15startSyncThreadEv+0x6c) [0xc1f07c]
       /usr/bin/mongod() [0xe13709]
       /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f7f5caa8e9a]
       /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7f5bdbbcbd]
      Wed Mar 20 22:08:42.300 [rsSync]
      
      ***aborting after fassert() failure
      
      
      Wed Mar 20 22:08:42.300 Got signal: 6 (Aborted).
      
      Wed Mar 20 22:08:42.304 Backtrace:
      0xdcae01 0x6ce879 0x7f7f5bcfe4a0 0x7f7f5bcfe425 0x7f7f5bd01b8b 0xd8abbe 0xc0230f 0xc1df91 0xc1edad 0xc1f07c 0xe13709 0x7f7f5caa8e9a 0x7f7f5bdbbcbd
       /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdcae01]
       /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6ce879]
       /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f7f5bcfe4a0]
       /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f7f5bcfe425]
       /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f7f5bd01b8b]
       /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xd8abbe]
       /usr/bin/mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x6f) [0xc0230f]
       /usr/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x71) [0xc1df91]
       /usr/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x2d) [0xc1edad]
       /usr/bin/mongod(_ZN5mongo15startSyncThreadEv+0x6c) [0xc1f07c]
       /usr/bin/mongod() [0xe13709]
       /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f7f5caa8e9a]
       /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7f5bdbbcbd]
      

            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: