[SERVER-18803] Master-slave replica set fails initial sync Created: 03/Jun/15  Updated: 16/Jun/15  Resolved: 16/Jun/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.6.8, 2.6.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alex Duncan Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Using MongoDB server and shell 2.6.8 or 2.6.10, take an existing MongoDB standalone instance and convert it to a 2 member replica set with the second member being a non-voting priority 0 secondary.

Participants:

 Description   

When converting a standalone MongoDB instance to a 2 member replica set (second member is non-voting priority 0, so as to achieve master-slave replication), the initial sync fails to complete and throws an error. Here is a pastebin of my config, rs.status() and log: http://pastebin.com/p7T4zZ5F

2015-06-02T17:04:07.177+0000 [rsSync] replSet syncing to: 10.0.1.75:27017
2015-06-02T17:04:07.180+0000 [rsSync] replSet initial sync drop all databases
2015-06-02T17:04:07.180+0000 [rsSync] dropAllDatabasesExceptLocal 2
2015-06-02T17:04:08.950+0000 [rsSync] removeJournalFiles
2015-06-02T17:04:08.970+0000 [rsSync] replSet initial sync clone all databases
2015-06-02T17:04:08.972+0000 [rsSync] replSet initial sync cloning db: 0
2015-06-02T17:04:08.976+0000 [rsSync] allocating new ns file /var/lib/mongo/0.ns, filling with zeroes...
2015-06-02T17:04:09.262+0000 [FileAllocator] allocating new datafile /var/lib/mongo/0.0, filling with zeroes...
2015-06-02T17:04:09.264+0000 [FileAllocator] done allocating datafile /var/lib/mongo/0.0, size: 64MB,  took 0.002 secs
2015-06-02T17:04:09.268+0000 [rsSync] build index on: 0.sessions properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "0.sessions" }
2015-06-02T17:04:09.268+0000 [rsSync]    added index to empty collection
2015-06-02T17:04:09.272+0000 [rsSync] build index on: 0.search properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "0.search" }
2015-06-02T17:04:09.272+0000 [rsSync]    added index to empty collection
2015-06-02T17:04:09.803+0000 [FileAllocator] allocating new datafile /var/lib/mongo/0.1, filling with zeroes...
2015-06-02T17:04:09.805+0000 [FileAllocator] done allocating datafile /var/lib/mongo/0.1, size: 128MB,  took 0.002 secs
2015-06-02T17:04:10.739+0000 [rsSync] replSet initial sync exception: 13106 nextSafe(): { $err: "assertion src/mongo/db/storage/extent_manager.cpp:109" } 2 attempts remaining
2015-06-02T17:04:11.364+0000 [conn10] end connection 10.0.1.75:33889 (1 connection now open)
2015-06-02T17:04:11.365+0000 [initandlisten] connection accepted from 10.0.1.75:33891 #11 (2 connections now open)
2015-06-02T17:04:40.740+0000 [rsSync] replSet initial sync pending
2015-06-02T17:04:40.740+0000 [rsSync] replSet syncing to: 10.0.1.75:27017
2015-06-02T17:04:40.742+0000 [rsSync] replSet initial sync drop all databases
2015-06-02T17:04:40.743+0000 [rsSync] dropAllDatabasesExceptLocal 2
2015-06-02T17:04:40.743+0000 [rsSync] removeJournalFiles
2015-06-02T17:04:40.763+0000 [rsSync] replSet initial sync clone all databases
2015-06-02T17:04:40.765+0000 [rsSync] replSet initial sync cloning db: 0
2015-06-02T17:04:40.769+0000 [rsSync] allocating new ns file /var/lib/mongo/0.ns, filling with zeroes...
2015-06-02T17:04:41.055+0000 [FileAllocator] allocating new datafile /var/lib/mongo/0.0, filling with zeroes...
2015-06-02T17:04:41.057+0000 [FileAllocator] done allocating datafile /var/lib/mongo/0.0, size: 64MB,  took 0.002 secs
2015-06-02T17:04:41.062+0000 [rsSync] build index on: 0.sessions properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "0.sessions" }
2015-06-02T17:04:41.062+0000 [rsSync]    added index to empty collection
2015-06-02T17:04:41.066+0000 [rsSync] build index on: 0.search properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "0.search" }
2015-06-02T17:04:41.066+0000 [rsSync]    added index to empty collection
2015-06-02T17:04:41.389+0000 [conn11] end connection 10.0.1.75:33891 (1 connection now open)
2015-06-02T17:04:41.390+0000 [initandlisten] connection accepted from 10.0.1.75:33893 #12 (2 connections now open)
2015-06-02T17:04:41.639+0000 [FileAllocator] allocating new datafile /var/lib/mongo/0.1, filling with zeroes...
2015-06-02T17:04:41.641+0000 [FileAllocator] done allocating datafile /var/lib/mongo/0.1, size: 128MB,  took 0.002 secs
2015-06-02T17:04:42.540+0000 [rsSync] replSet initial sync exception: 13106 nextSafe(): { $err: "assertion src/mongo/db/storage/extent_manager.cpp:109" } 1 attempts remaining
2015-06-02T17:05:11.415+0000 [conn12] end connection 10.0.1.75:33893 (1 connection now open)
2015-06-02T17:05:11.417+0000 [initandlisten] connection accepted from 10.0.1.75:33895 #13 (2 connections now open)
2015-06-02T17:05:12.540+0000 [rsSync] replSet initial sync pending
2015-06-02T17:05:12.541+0000 [rsSync] replSet syncing to: 10.0.1.75:27017
2015-06-02T17:05:12.543+0000 [rsSync] replSet initial sync drop all databases
2015-06-02T17:05:12.543+0000 [rsSync] dropAllDatabasesExceptLocal 2
2015-06-02T17:05:14.331+0000 [rsSync] removeJournalFiles
2015-06-02T17:05:14.351+0000 [rsSync] replSet initial sync clone all databases
2015-06-02T17:05:14.352+0000 [rsSync] replSet initial sync cloning db: 0
2015-06-02T17:05:14.356+0000 [rsSync] allocating new ns file /var/lib/mongo/0.ns, filling with zeroes...
2015-06-02T17:05:14.644+0000 [FileAllocator] allocating new datafile /var/lib/mongo/0.0, filling with zeroes...
2015-06-02T17:05:14.646+0000 [FileAllocator] done allocating datafile /var/lib/mongo/0.0, size: 64MB,  took 0.002 secs
2015-06-02T17:05:14.650+0000 [rsSync] build index on: 0.sessions properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "0.sessions" }
2015-06-02T17:05:14.650+0000 [rsSync]    added index to empty collection
2015-06-02T17:05:14.655+0000 [rsSync] build index on: 0.search properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "0.search" }
2015-06-02T17:05:14.655+0000 [rsSync]    added index to empty collection
2015-06-02T17:05:15.192+0000 [FileAllocator] allocating new datafile /var/lib/mongo/0.1, filling with zeroes...
2015-06-02T17:05:15.195+0000 [FileAllocator] done allocating datafile /var/lib/mongo/0.1, size: 128MB,  took 0.002 secs
2015-06-02T17:05:16.118+0000 [rsSync] replSet initial sync exception: 13106 nextSafe(): { $err: "assertion src/mongo/db/storage/extent_manager.cpp:109" } 0 attempts remaining
2015-06-02T17:05:41.441+0000 [conn13] end connection 10.0.1.75:33895 (1 connection now open)
2015-06-02T17:05:41.442+0000 [initandlisten] connection accepted from 10.0.1.75:33896 #14 (2 connections now open)
2015-06-02T17:05:46.118+0000 [rsSync] Fatal Assertion 16233
2015-06-02T17:05:46.122+0000 [rsSync] 0x1205431 0x11a7229 0x1189d5d 0xea31af 0xec3844 0xec38f0 0xec3c0a 0x1249dc9 0x7f4ed46aedf3 0x7f4ed39b81ad
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x1205431]
 /usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x11a7229]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xcd) [0x1189d5d]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x6f) [0xea31af]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x124) [0xec3844]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x30) [0xec38f0]
 /usr/bin/mongod(_ZN5mongo15startSyncThreadEv+0xaa) [0xec3c0a]
 /usr/bin/mongod() [0x1249dc9]
 /lib64/libpthread.so.0(+0x7df3) [0x7f4ed46aedf3]
 /lib64/libc.so.6(clone+0x6d) [0x7f4ed39b81ad]
2015-06-02T17:05:46.122+0000 [rsSync]
 
***aborting after fassert() failure
 
 
2015-06-02T17:05:46.126+0000 [rsSync] SEVERE: Got signal: 6 (Aborted).
Backtrace:0x1205431 0x120480e 0x7f4ed38f7640 0x7f4ed38f75c9 0x7f4ed38f8cd8 0x1189dca 0xea31af 0xec3844 0xec38f0 0xec3c0a 0x1249dc9 0x7f4ed46aedf3 0x7f4ed39b81ad
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x1205431]
 /usr/bin/mongod() [0x120480e]
 /lib64/libc.so.6(+0x35640) [0x7f4ed38f7640]
 /lib64/libc.so.6(gsignal+0x39) [0x7f4ed38f75c9]
 /lib64/libc.so.6(abort+0x148) [0x7f4ed38f8cd8]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x13a) [0x1189dca]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x6f) [0xea31af]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x124) [0xec3844]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x30) [0xec38f0]
 /usr/bin/mongod(_ZN5mongo15startSyncThreadEv+0xaa) [0xec3c0a]
 /usr/bin/mongod() [0x1249dc9]
 /lib64/libpthread.so.0(+0x7df3) [0x7f4ed46aedf3]
 /lib64/libc.so.6(clone+0x6d) [0x7f4ed39b81ad]



 Comments   
Comment by Ramon Fernandez Marina [ 16/Jun/15 ]

aduncan, thanks for getting back to us, and glad to hear your replica set is up and running. You may still want to consider close monitoring of your storage layer to prevent any surprises in the future.

Regards,
Ramón.

Comment by Alex Duncan [ 15/Jun/15 ]

Hi Ramón,

It appears as though adding the new members after initiating the replica set as opposed to setting the initial config to add the members has worked.

Also I was able to import the data using mongorestore after initiating the replica set and adding the secondaries.-

I believe this ticket can be closed.

Comment by Ramon Fernandez Marina [ 11/Jun/15 ]

aduncan, the log seems to indicate there was an error fetching documents in the source server, probably related to corrupted data. Can you please send us logs from the primary server at the time you observed this issue on the secondary? Can you also please check the health of your storage on the primary server? It is not uncommon for a flaky disk to create data corruption in MongoDB files.

Thanks,
Ramón.

Generated at Thu Feb 08 03:48:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.