[SERVER-5927] During full resyncing, when a connection error happens, thread fault appears and restart full resyncing. Is this a correct process? Created: 25/May/12  Updated: 08/Mar/13  Resolved: 27/Nov/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Kihyun Kim Assignee: Mathias Stearn
Resolution: Incomplete Votes: 0
Labels: replication, resyncing
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Oracle linux 5.7, iSCSI(not shared with network)
4 shards. 5 members for each shard


Participants:

 Description   

We tried the full resyncing(remove all files in dbpath, then restart mongod).

After for a while, we found these messages. And mongod continues to sync fully again.

Is this a correct process that can be happened?

=======================================================================================
Fri May 25 01:26:12 [rsSync] 8120473 objects cloned so far from collection mylog.fs.chunks
Fri May 25 01:26:18 [FileAllocator] allocating new datafile /log/data/repl01_2/mylog.108, filling with zeroes...
Fri May 25 01:26:32 [FileAllocator] done allocating datafile /log/data/repl01_2/mylog.108, size: 2047MB, took 14.517 secs
Fri May 25 01:26:33 [rsSync] clone mylog.fs.chunks 8142975
...... # some connection messages
Fri May 25 01:26:56 [rsSync] Socket recv() errno:104 Connection reset by peer fc1301:35000
Fri May 25 01:26:56 [rsSync] SocketException: remote: fc1301:35000 error: 9001 socket exception [1] server [fc1301:35000]
Fri May 25 01:26:56 [rsSync] Assertion: 13273:single data buffer expected
0x584722 0x5df960 0x5e18c8 0x5bf339 0x84cb77 0x84eb15 0x850409 0x82bfbe 0x82dbd3 0x826ee1 0x826f9a 0x827420 0xaa80b0 0x37c900673d 0x37c84d3d1d
/home/logadmin/mongodb/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x112) [0x584722]
/home/logadmin/mongodb/bin/mongod(_ZN5mongo14DBClientCursor12dataReceivedERbRSs+0x180) [0x5df960]
/home/logadmin/mongodb/bin/mongod(_ZN5mongo14DBClientCursor18exhaustReceiveMoreEv+0x158) [0x5e18c8]
/home/logadmin/mongodb/bin/mongod(_ZN5mongo18DBClientConnection5queryEN5boost8functionIFvRNS_27DBClientCursorBatchIteratorEEEERKSsNS_5QueryEPKNS_7BSONObjEi+0x209) [0x5bf339]
/home/logadmin/mongodb/bin/mongod(_ZN5mongo6Cloner4copyEPKcS2_bbbbbbNS_5QueryE+0x3a7) [0x84cb77]
/home/logadmin/mongodb/bin/mongod(_ZN5mongo6Cloner2goEPKcRSsRKSsbbbbbbPi+0x1665) [0x84eb15]
/home/logadmin/mongodb/bin/mongod(_ZN5mongo9cloneFromEPKcRSsRKSsbbbbbbPi+0x59) [0x850409]
/home/logadmin/mongodb/bin/mongod(_ZN5mongo11ReplSetImpl18_syncDoInitialSyncEv+0xe5e) [0x82bfbe]
/home/logadmin/mongodb/bin/mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x23) [0x82dbd3]
/home/logadmin/mongodb/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x61) [0x826ee1]
/home/logadmin/mongodb/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x4a) [0x826f9a]
/home/logadmin/mongodb/bin/mongod(_ZN5mongo15startSyncThreadEv+0xa0) [0x827420]
/home/logadmin/mongodb/bin/mongod(thread_proxy+0x80) [0xaa80b0]
/lib64/libpthread.so.0 [0x37c900673d]
/lib64/libc.so.6(clone+0x6d) [0x37c84d3d1d]
Fri May 25 01:26:57 [rsSync] Socket flush send() errno:9 Bad file descriptor fc1301:35000
Fri May 25 01:26:57 [rsSync] mylog caught exception (socket exception) in destructor (~PiggyBackData)
Fri May 25 01:26:57 [rsSync] replSet initial sync exception 13273 single data buffer expected
...... # some writebacklisten meesages
Fri May 25 01:27:27 [rsSync] replSet initial sync pending
Fri May 25 01:27:27 [rsSync] replSet syncing to: fc1301:35000
Fri May 25 01:27:27 [rsSync] replSet initial sync drop all databases
Fri May 25 01:27:27 [rsSync] dropAllDatabasesExceptLocal 2
Fri May 25 01:27:27 [rsSync] removeJournalFiles
=======================================================================================



 Comments   
Comment by Mathias Stearn [ 28/Aug/12 ]

Mongodump and resyncing use very similar mechanisms. Are you sure that the mongodump succeeds without error between the same two machines? It could just be timing related then. Does mongod eventually complete it's resync or does it get stuck at the same point every time?

Comment by Kihyun Kim [ 16/Aug/12 ]

Ok, it looks like a network weakness issue.
And there is no negative effects except infinitely re-sync.
but every running of mongodump works good without any error or warning messages. if a mongodump is using other special mechanism to failover from the network problem, what do you think about applying to re-syncing process?

Comment by Mathias Stearn [ 07/Aug/12 ]

It looks like there was a connectivity issue which the server was able to recover from. Have there been any negative effects?

Comment by auto [ 13/Jul/12 ]

Author:

{u'date': u'2012-07-13T06:22:39-07:00', u'email': u'ian.whalen@gmail.com', u'name': u'Ian Whalen'}

Message: Merge pull request #263 from stevebriskin/master

JS tests for SERVER-5927
Branch: master
https://github.com/mongodb/mongo/commit/26d6610d0dca3d9eaad55dd5bef18ec28b4546a9

Generated at Thu Feb 08 03:10:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.