[SERVER-25570] Fatal Assertion on Secondary Created: 11/Aug/16  Updated: 13/Dec/16  Resolved: 13/Dec/16

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 3.2.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bob Potter Assignee: Kelsey Schubert
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

We are seeing a fatal assertion error on a couple of our servers. The error has happened a few times in the past week. After the first error we resynced the data on one of the replicas (in case it was a result of corruption) but it reoccurred after a couple days.

We are running version 3.2.8 with the MMAPv1 storage engine.

The backtrace is below, let me know if there is any additional information I can provide.

Backtrace:

2016-08-11T08:24:56.632+0000 I -        [conn15259463] Fatal Assertion 17441
2016-08-11T08:24:56.633+0000 I -        [conn15259463] 
 
***aborting after fassert() failure
 
 
2016-08-11T08:24:56.695+0000 F -        [conn15259463] Got signal: 6 (Aborted).
 
 0x131ce72 0x131bfc9 0x131c7d2 0x7f6cebb4b330 0x7f6ceb7acc37 0x7f6ceb7b0028 0x12a6772 0x105acc6 0x105aceb 0x1069aed 0x1069c5f 0xbea270 0xe2d015 0xe2d6d9 0xdeb122 0xdeb81d 0xcd0a39 0xcd6d05 0x9b937c 0x12c9645 0x7f6cebb43184 0x7f6ceb87037d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F1CE72","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F1BFC9"},{"b":"400000","o":"F1C7D2"},{"b":"7F6CEBB3B000","o":"10330"},{"b":"7F6CEB776000","o":"36C37","s":"gsignal"},{"b":"7F6CEB776000","o":"3A028","s":"abort"},{"b":"400000","o":"EA6772","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"C5ACC6"},{"b":"400000","o":"C5ACEB","s":"_ZNK5mongo17RecordStoreV1Base13getNextRecordEPNS_16OperationContextERKNS_7DiskLocE"},{"b":"400000","o":"C69AED","s":"_ZN5mongo27SimpleRecordStoreV1Iterator7advanceEv"},{"b":"400000","o":"C69C5F","s":"_ZN5mongo27SimpleRecordStoreV1Iterator4nextEv"},{"b":"400000","o":"7EA270","s":"_ZN5mongo14CollectionScan4workEPm"},{"b":"400000","o":"A2D015","s":"_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE"},{"b":"400000","o":"A2D6D9","s":"_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE"},{"b":"400000","o":"9EB122"},{"b":"400000","o":"9EB81D","s":"_ZN5mongo7getMoreEPNS_16OperationContextEPKcixPbS4_"},{"b":"400000","o":"8D0A39","s":"_ZN5mongo15receivedGetMoreEPNS_16OperationContextERNS_10DbResponseERNS_7MessageERNS_5CurOpE"},{"b":"400000","o":"8D6D05","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"400000","o":"5B937C","s":"_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE"},{"b":"400000","o":"EC9645","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7F6CEBB3B000","o":"8184"},{"b":"7F6CEB776000","o":"FA37D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.8", "gitVersion" : "ed70e33130c977bda0024c125b56d159573dbaf0", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-45-generic", "version" : "#74-Ubuntu SMP Tue Jan 13 19:36:28 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "A53FF676E1D627BD1D9B1BF524DEFA13B667EE83" }, { "b" : "7FFFCA2FE000", "elfType" : 3, "buildId" : "9D77366C6409A9EA266179080FA7C779EEA8A958" }, { "b" : "7F6CECA5B000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "FF43D0947510134A8A494063A3C1CF3CEBB27791" }, { "b" : "7F6CEC681000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "379F80D2768BA6A21F52781895EE9F47B34A0A85" }, { "b" : "7F6CEC479000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "E2A6DD5048A0A051FD61043BDB69D8CC68192AB7" }, { "b" : "7F6CEC275000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "DA9B8C234D0FE9FD8CAAC8970A7EC1B6C8F6623F" }, { "b" : "7F6CEBF6F000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "D144258E614900B255A31F3FD2283A878670D5BC" }, { "b" : "7F6CEBD59000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7F6CEBB3B000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "31E9F21AE8C10396171F1E13DA15780986FA696C" }, { "b" : "7F6CEB776000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "CF699A15CAAE64F50311FC4655B86DC39A479789" }, { "b" : "7F6CECCB9000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "D0F537904076D73F29E4A37341F8A449E2EF6CD0" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x131ce72]
 mongod(+0xF1BFC9) [0x131bfc9]
 mongod(+0xF1C7D2) [0x131c7d2]
 libpthread.so.0(+0x10330) [0x7f6cebb4b330]
 libc.so.6(gsignal+0x37) [0x7f6ceb7acc37]
 libc.so.6(abort+0x148) [0x7f6ceb7b0028]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x12a6772]
 mongod(+0xC5ACC6) [0x105acc6]
 mongod(_ZNK5mongo17RecordStoreV1Base13getNextRecordEPNS_16OperationContextERKNS_7DiskLocE+0x1B) [0x105aceb]
 mongod(_ZN5mongo27SimpleRecordStoreV1Iterator7advanceEv+0x3D) [0x1069aed]
 mongod(_ZN5mongo27SimpleRecordStoreV1Iterator4nextEv+0x4F) [0x1069c5f]
 mongod(_ZN5mongo14CollectionScan4workEPm+0x940) [0xbea270]
 mongod(_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x275) [0xe2d015]
 mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x39) [0xe2d6d9]
 mongod(+0x9EB122) [0xdeb122]
 mongod(_ZN5mongo7getMoreEPNS_16OperationContextEPKcixPbS4_+0x52D) [0xdeb81d]
 mongod(_ZN5mongo15receivedGetMoreEPNS_16OperationContextERNS_10DbResponseERNS_7MessageERNS_5CurOpE+0x1A9) [0xcd0a39]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xE35) [0xcd6d05]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0xEC) [0x9b937c]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x325) [0x12c9645]
 libpthread.so.0(+0x8184) [0x7f6cebb43184]
 libc.so.6(clone+0x6D) [0x7f6ceb87037d]
-----  END BACKTRACE  -----



 Comments   
Comment by Kelsey Schubert [ 13/Dec/16 ]

Hi bpot,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Thomas

Comment by Ramon Fernandez Marina [ 13/Aug/16 ]

Sorry to hear you've run into this issue bpot. This is typically indicative of data corruption on disk, which is often caused by faulty storage. What kind of servers and storage are you using? It's strange that this would happen on two different servers, but I'd recommend you run a storage health check.

It would be useful to get logs for the affected nodes from the last restart until the fassert() above. Depending on what the logs show it may help to run validate() to get additional information.

Thanks,
Ramón.

Generated at Thu Feb 08 04:09:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.