[SERVER-6762] Assertion failure cursor.get() db/repl/../oplogreader.h 93 Created: 14/Aug/12  Updated: 11/Jul/16  Resolved: 05/Sep/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.6
Fix Version/s: 2.3.0

Type: Bug Priority: Major - P3
Reporter: Roderic Liu Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

SLES 11


Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: Linux
Participants:

 Description   

We have a 5 servers replset in our production enviroment. But recently, it got a huge replication lag frequently.
I checked the mongodb log file, each time the replication lag happens, I saw this in the log file:

Mon Aug 13 22:46:36 [rsSync] Socket recv() timeout 10.20.1.18:27017
Mon Aug 13 22:46:36 [rsSync] SocketException: remote: 10.20.1.18:27017 error: 9001 socket exception [3] server [10.20.1.18:27017]
Mon Aug 13 22:46:36 [rsSync] DBClientCursor::init call() failed
Mon Aug 13 22:46:37 [rsSync] replSet syncing to: 10.20.1.18:27017
Mon Aug 13 22:46:49 [rsGhostSync] Socket recv() timeout 10.20.1.18:27017
Mon Aug 13 22:46:49 [rsGhostSync] SocketException: remote: 10.20.1.18:27017 error: 9001 socket exception [3] server [10.20.1.18:27017]
Mon Aug 13 22:46:49 [rsGhostSync] DBClientCursor::init call() failed
Mon Aug 13 22:46:49 [rsGhostSync] Assertion failure cursor.get() db/repl/../oplogreader.h 93
0x57a8a6 0x5853eb 0x8254f1 0x58fc23 0x58d7f4 0x58ce23 0x5742ef 0x576664 0xaabca0 0x7f9d48025070 0x7f9d4761e10d
/usr/local/mongodb/bin/mongod(_ZN5mongo12sayDbContextEPKc+0x96) [0x57a8a6]
/usr/local/mongodb/bin/mongod(_ZN5mongo8assertedEPKcS1_j+0xfb) [0x5853eb]
/usr/local/mongodb/bin/mongod(_ZN5mongo9GhostSync9percolateERKNS_7BSONObjERKNS_6OpTimeE+0xbb1) [0x8254f1]
/usr/local/mongodb/bin/mongod(_ZNK5boost9function0IvEclEv+0x243) [0x58fc23]
/usr/local/mongodb/bin/mongod(_ZN5mongo4task6Server6doWorkEv+0x254) [0x58d7f4]
/usr/local/mongodb/bin/mongod(_ZN5mongo4task4Task3runEv+0x33) [0x58ce23]
/usr/local/mongodb/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xbf) [0x5742ef]
/usr/local/mongodb/bin/mongod(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0x576664]
/usr/local/mongodb/bin/mongod(thread_proxy+0x80) [0xaabca0]
/lib64/libpthread.so.0 [0x7f9d48025070]
/lib64/libc.so.6(clone+0x6d) [0x7f9d4761e10d]

Once the connection timeout, mongodb stopped pulling oplog from the primary node for some time until the connection is re-established. Because the replset is using a Chained replication, if the first secondary node in the chain has a timeout connection with the primary node, then all the other secondary nodes connect to it became lag too.

This happened so often and we didn't manage to find anything about this, can you help us checking if it's a bug or not.



 Comments   
Comment by auto [ 05/Sep/12 ]

Author:

{u'date': u'2012-09-05T07:11:55-07:00', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: Check repl cursor before using SERVER-6762
Branch: master
https://github.com/mongodb/mongo/commit/f0a27d72c5b41866db1b49b83e4e176e0952ea23

Comment by Kristina Chodorow (Inactive) [ 14/Aug/12 ]

Yes, this is a bug. Triaging...

Generated at Thu Feb 08 03:12:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.