[SERVER-4433] Replication should be smarter if unable to apply an operation to a secondary Created: 05/Dec/11  Updated: 23/Feb/15  Resolved: 23/Feb/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 1.8.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Pasette (Inactive) Assignee: Unassigned
Resolution: Done Votes: 0
Labels: sync
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MongoDB starting : pid=31397 port=27018 dbpath=/mongodbdata/ 64-bit
db version v1.8.4, pdfile version 4.5
git version: 81f12749a15e3d158b1b16bab6bc3faea538e166
build sys info: Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41
journal dir=/mongodbdata/journal


Issue Links:
Related
Operating System: ALL
Participants:

 Description   

If sync operation detects corruption or a bad operation it should fail-fast rather than skip the operation and allow secondary to get out of sync.

Wed Nov 16 03:31:30 [replica set sync] replSet skipping bad op in oplog: Assertion: 10329:Element too large
0x55f33a 0x4ec4c9 0x706fe8 0x707242 0x70c3ab 0x70c978 0x70c9fc 0x70ce95 0x8c1d90 0x3410a0673d 0x340fed44bd
 /home/mongodb/latest/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x12a) [0x55f33a]
 /home/mongodb/latest/bin/mongod(_ZNK5mongo7BSONObj8toStringERNS_13StringBuilderEbb+0x2d9) [0x4ec4c9]
 /home/mongodb/latest/bin/mongod(_ZN5mongo5blankERKNS_7BSONObjE+0x88) [0x706fe8]
 /home/mongodb/latest/bin/mongod(_ZN5mongo11ReplSetImpl9syncApplyERKNS_7BSONObjE+0x182) [0x707242]
 /home/mongodb/latest/bin/mongod(_ZN5mongo11ReplSetImpl8syncTailEv+0x193b) [0x70c3ab]
 /home/mongodb/latest/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0xc8) [0x70c978]
 /home/mongodb/latest/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x3c) [0x70c9fc]
 /home/mongodb/latest/bin/mongod(_ZN5mongo15startSyncThreadEv+0x215) [0x70ce95]
 /home/mongodb/latest/bin/mongod(thread_proxy+0x80) [0x8c1d90]
 /lib64/libpthread.so.0 [0x3410a0673d]
 /lib64/libc.so.6(clone+0x6d) [0x340fed44bd]



 Comments   
Comment by Eric Milkie [ 08/Mar/13 ]

We've now flipped the other way, where we shutdown the secondary immediately if we are unable to apply an op. This isn't necessarily "smarter" and we could do better.

Generated at Thu Feb 08 03:05:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.