[SERVER-23525] Mongo Replica Set unusable after Primary crashes with out of Memory, followed by Op Log out of Order on replica members Created: 05/Apr/16  Updated: 05/Apr/16  Resolved: 05/Apr/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: pavan Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-21988 Rollback does not wait for applier to... Closed
Operating System: ALL
Participants:

 Description   

Primary Out of Memory - crashes on large aggregation, about 360 K records. Replica set automatically elects new primary, Manual restart of crashed primary ( has higher priority), re-elects the primary, replica 1 and 2 crash with
Fatal assertion 34361 OplogOutOfOrder Attempted to apply an earlier oplog entry (ts: Apr 4 17:07:13:10442) when our lastWrittenOptime was (term: 18, timestamp: Apr 4 17:07:13:10472). Repeated attempts to start replica 1 and replica 2 keep failing.

2016-04-04T17:29:32.442-0700 I CONTROL  [rsSync]
 0x12ea922 0x1288e84 0x1276112 0xf24a34 0xf1a740 0x7fa1567bfa40 0x7fa155fdc182 0x7fa155d0947d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"EEA922"},{"b":"400000","o":"E88E84"},{"b":"400000","o":"E76112"},{"b":"400000","o":"B24A34"},{"b":"400000","o":"B1A740"},{"b":"7FA15670E000","o":"B1A40"},{"b":"7FA155FD4000","o":"8182"},{"b":"7FA155C0F000","o":"FA47D"}],"processInfo":{ "mongodbVersion" : "3.2.1", "gitVersion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-74-generic", "version" : "#118-Ubuntu SMP Thu Dec 17 22:52:10 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "D76764D44DE9B088362776AF243199FDCF5756E8" }, { "b" : "7FFE760B3000", "elfType" : 3, "buildId" : "DC075B751E9FB361F14CD59BD81300A6BB5CB377" }, { "b" : "7FA1571F9000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "D08DD65F97859C71BB2CBBF1043BD968EFE18AAD" }, { "b" : "7FA156E1E000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "F86FA9FB4ECEB4E06B40DBDF761A4172B70A4229" }, { "b" : "7FA156C16000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7FA156A12000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7FA15670E000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "4BF6F7ADD8244AD86008E6BF40D90F8873892197" }, { "b" : "7FA156408000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7FA1561F2000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7FA155FD4000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7FA155C0F000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7FA157458000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x12ea922]
 mongod(_ZN5mongo10logContextEPKc+0x134) [0x1288e84]
 mongod(_ZN5mongo23fassertFailedWithStatusEiRKNS_6StatusE+0x62) [0x1276112]
 mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEv+0x1564) [0xf24a34]
 mongod(_ZN5mongo4repl13runSyncThreadEv+0x230) [0xf1a740]
 libstdc++.so.6(+0xB1A40) [0x7fa1567bfa40]
 libpthread.so.0(+0x8182) [0x7fa155fdc182]
 libc.so.6(clone+0x6D) [0x7fa155d0947d]
-----  END BACKTRACE  -----



 Comments   
Comment by pavan [ 05/Apr/16 ]

Thomas, thank your for the quick response, you are correct, apparently QA environment replica sets were still running on 3.2.1, this was where the errors happened The master has the update, 3.2.3. I was looking at the master when I reported the issues again, will upgrade the replica sets to the latest version and retest our scenarios.

Regards

Comment by Kelsey Schubert [ 05/Apr/16 ]

Hi ppeddada,

The backtrace that you have provided indicates that this assertion occurred on MongoDB 3.2.1.

"mongodbVersion" : "3.2.1", "gitVersion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2"

Can you please confirm the version you are currently using by providing the startup logs of the affected node?

Since encountering this error have you upgraded your node and seen this error message again? Or have you seen this error on other nodes running on MongoDB 3.2.3? If so, can you please provide the logs of these nodes?

SERVER-21988 was fixed as part of the development release 3.3.0 and was backported to the production release 3.2.3. Consequently, MongoDB 3.2.4 includes this fix as well. We recommend always upgrading to the latest revision (the third number in the MongoDB version number).

Kind regards,
Thomas

Comment by pavan [ 05/Apr/16 ]

Thomas, we are already on Mongo 3.2.3. Are you recommending that we upgrade to 3.2.4 ?.
The error happened on 3.2.3. I see the ticket SERVER-21988 shows as fixed on 3.3.2 & 3.3.0. Pls. suggest.
Thx

Comment by Kelsey Schubert [ 05/Apr/16 ]

Hi ppeddada,

Thanks for reporting this issue. From the behavior you describe, I believe this a duplicate of SERVER-21988, which was fixed in MongoDB 3.2.3. Please upgrade to MongoDB 3.2.4, and report back if the issue persists.

Kind regards,
Thomas

Generated at Thu Feb 08 04:03:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.