[SERVER-26954] mongoD 3.2.10 crashes with Invariant failure false src\mongo\db\storage\mmap_v1\btree\btree_logic.cpp 1746 Created: 08/Nov/16  Updated: 09/Dec/16  Resolved: 18/Nov/16

Status: Closed
Project: Core Server
Component/s: MMAPv1
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alessandro Gherardi Assignee: Kelsey Schubert
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Environment:

  • Mongo 3.2.10
  • Windows Server 2012
  • Replicaset with 2 mongoD's (mmapv1) + arbiter
  • Cluster with 3 config servers in CSRS + mongoS

After we upgraded from 3.0.4 to 3.2.10, we're seeing one of the 2 mongoD's constantly crashing. Here's the relevant entries in mongod.log:

2016-11-01T10:20:26.800-0700 I -        [repl writer worker 13] Invariant failure false src\mongo\db\storage\mmap_v1\btree\btree_logic.cpp 1746^M
2016-11-01T10:20:26.800-0700 I -        [repl writer worker 13] ^M
^M
***aborting after invariant() failure^M
^M
^M
2016-11-01T10:20:27.098-0700 I CONTROL  [repl writer worker 13] mongod.exe      index_collator_extension+0x1a4663^M
2016-11-01T10:20:27.098-0700 I CONTROL  [repl writer worker 13] mongod.exe      index_collator_extension+0x1a35a4^M
2016-11-01T10:20:27.098-0700 I CONTROL  [repl writer worker 13] mongod.exe      index_collator_extension+0x1a2fc6^M
2016-11-01T10:20:27.098-0700 I CONTROL  [repl writer worker 13] MSVCR120.dll    raise+0x1e9^M
2016-11-01T10:20:27.098-0700 I CONTROL  [repl writer worker 13] MSVCR120.dll    abort+0x18^M
2016-11-01T10:20:27.098-0700 I CONTROL  [repl writer worker 13] mongod.exe      index_collator_extension+0x14c713^M
2016-11-01T10:20:27.099-0700 I CONTROL  [repl writer worker 13] mongod.exe      ???^M



 Comments   
Comment by Kelsey Schubert [ 18/Nov/16 ]

Hi alessandro.gherardi@yahoo.com,

Thank you for following up. I'm glad to hear that the issue has been resolved.

In cases like this it is very challenging to track down the root cause. Unfortunately, there is not enough information at this time to complete an investigation to determine whether there is a bug in the server or if faulty memory or storage explains this behavior. If you experience this again, please save the data files and reopen this ticket so we can continue to investigate.

Kind regards,
Thomas

Comment by Alessandro Gherardi [ 08/Nov/16 ]

Thanks.

To answer your questions:

  1. I don't know for sure. My guess is that is was the "events" collection - it's by far the largest we have.
  2. We followed the steps listed here https://docs.mongodb.com/manual/release-notes/3.2-upgrade/#upgrade-a-sharded-cluster-to-3-2 then https://docs.mongodb.com/manual/tutorial/upgrade-config-servers-to-replica-set
  3. Should I attach the logs to this ticket? Or do you have an FTP or some other location I can upload them to?
  4. Are you referring to event viewer -> Windows Logs -> System? The only event at the time of the first failure is this: Service Control Manager,7031,None,The MongoD service terminated unexpectedly.  It has done this 1 time(s).  The following corrective action will be taken in 10000 milliseconds: Restart the service.
  5. I no longer have the data files. I do have the mongoD minidumps. Are those useful? If yes, please let me know how to send them to you.

The initial sync appears to have resolved the problem.

Comment by Kelsey Schubert [ 08/Nov/16 ]

Hi alessandro.gherardi@yahoo.com,

Thank you for reporting this issue. So we can continue to investigate would you please provide us with some additional information?

  1. Are you aware of which index or collection was being accessed?
  2. What was the upgrade procedure?
  3. Would you please upload the complete logs of the affected node?
  4. Are there are errors in the system logs?
  5. Would you please create a copy of your data files? It is possible that we may want to examine them to better determine what is going on here.

To resolve this issue, I would recommend executing an initial sync. If an initial sync is not feasible with your load and configuration, I would suggest starting the affected node with mongod --repair,

Kind regards,
Thomas

Comment by Alessandro Gherardi [ 08/Nov/16 ]

Also notice that, before the upgrade, the system had been running on 3.0.4 since at least 2016-10-19, without any crashes.

Generated at Thu Feb 08 04:13:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.