[SERVER-22446] Switching repl protocol version combined with step down triggers fassert Created: 02/Feb/16  Updated: 25/Apr/16  Resolved: 25/Apr/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.3.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Siyuan Zhou
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File log    
Operating System: ALL
Sprint: Repl 11 (03/11/16), Repl 12 (04/01/16), Repl 13 (04/22/16), Repl 14 (05/13/16)
Participants:

 Description   

2016-01-27T13:40:04.366-0500 I -        [rsSync] Fatal assertion 34361 OplogOutOfOrder: Attempted to apply an oplog entry ((term: 0, timestamp: Jan 27 13:40:03:2)) which is not greater than our lastWrittenOptime ((term: 1, timestamp: Jan 27 13:35:36:1)).
 0x14cedb2 0x1473202 0x145d442 0x1030461 0x1022b70 0x7f1fd50961e0 0x7f1fd48aff8e 0x7f1fd45d9a0d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"10CEDB2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"1073202","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"105D442","s":"_ZN5mongo23fassertFailedWithStatusEiRKNS_6StatusE"},{"b":"400000","o":"C30461","s":"_ZN5mongo4repl8SyncTail16oplogApplicationEv"},{"b":"400000","o":"C22B70","s":"_ZN5mongo4repl13runSyncThreadEv"},{"b":"7F1FD4FE0000","o":"B61E0"},{"b":"7F1FD48A8000","o":"7F8E"},{"b":"7F1FD44DF000","o":"FAA0D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.3.0-100-gae5deb1", "gitVersion" : "ae5deb12309858db224bbff8a2a717e21e9a8bc8", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.8.0-35-generic", "version" : "#50-Ubuntu SMP Tue Dec 3 01:24:59 UTC 2013", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFFA92F6000", "elfType" : 3, "buildId" : "C34731DD03A6781F7B90E4B441D073D3B4C33F65" }, { "b" : "7F1FD54EF000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "8143139EDE058AD7AA1DA2E9972D3843F712A758" }, { "b" : "7F1FD52EB000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "30D12A669C22B52949902AB59B41738C4790E149" }, { "b" : "7F1FD4FE0000", "path" : "/usr/local/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F1FD4CDB000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "DEB41EC41A35E9184C4679C4E5AFF6E3D06B4348" }, { "b" : "7F1FD4AC5000", "path" : "/usr/local/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F1FD48A8000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "23D6B805A9970B893E3C17D8756A69A26CE99DB9" }, { "b" : "7F1FD44DF000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30990788EB9EC583D2006CDE9A7841E43B2884FB" }, { "b" : "7F1FD56F7000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "B93C43C481B8B1680F449F697B8FD92695F545CE" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x14cedb2]
 mongod(_ZN5mongo10logContextEPKc+0x132) [0x1473202]
 mongod(_ZN5mongo23fassertFailedWithStatusEiRKNS_6StatusE+0x62) [0x145d442]
 mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEv+0x1761) [0x1030461]
 mongod(_ZN5mongo4repl13runSyncThreadEv+0x2C0) [0x1022b70]
 libstdc++.so.6(+0xB61E0) [0x7f1fd50961e0]
 libpthread.so.0(+0x7F8E) [0x7f1fd48aff8e]
 libc.so.6(clone+0x6D) [0x7f1fd45d9a0d]
-----  END BACKTRACE  -----
2016-01-27T13:40:04.367-0500 I -        [rsSync] 
 
***aborting after fassert() failure

Environment:
1 shard cluster

Cannot remember the exact steps, but it involves connecting directly to primary, calling replSetReconfig to switch to pv0, replSetStepDown, calling replSetReconfig to switch to pv1, replSetStepDown.



 Comments   
Comment by Siyuan Zhou [ 08/Apr/16 ]

fanguigui, this is a different issue than SERVER-21988 as rollback is not involved here. We cannot reproduce this issue so far and are still debugging with the submitter.

Comment by fanguigui [ 08/Apr/16 ]

seems to be the same bug with this one `https://jira.mongodb.org/browse/SERVER-21988`, the fixed core server version is 3.2.3, 3.3.0,but the bug still exists in latested stable release version 3.2.4,that's strange

Comment by Randolph Tan [ 03/Feb/16 ]

I don't think so. This is how I have it in my shell session:

c0.getDB('admin').runCommand({ replSetReconfig: conf })
c0.getDB('admin').runCommand({ replSetStepDown: 1, secondaryCatchUpPeriodSecs: 1 })

Comment by Eric Milkie [ 03/Feb/16 ]

Randolph, did you happen to run reconfig with the force parameter? And thus is it possible you ran reconfig on a secondary by mistake? I expect that might cause the behavior seen here.

Generated at Thu Feb 08 04:00:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.