Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Internal Code
Labels:
None

Assigned Teams:

Storage Execution
Operating System:
ALL
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Analysis below is relatively speculative, and I haven't proven any of it via testing. Hopefully this ticket is at least useful as a survey of cases where we've seen memory corruption during shutdown, often as a cascade failure after an earlier failure triggered the shutdown itself.

It looks like memory corruption can be triggered after an unclean shutdown and potentially can occur after a clean shutdown too. From the examples I've seen it kind of looks like double frees are occurring because global objects with members that manage their own heap memory are destroyed as the process is exiting, and then the heap memory of these members are freed again as a result of actions taken by a thread that's still running.

Here are some historical jira cases where I've seen this occur:

When Top::cloneMap called by SnapshotData::takeSnapshot. Potentially caused by the global statsSnapshots variable getting destroyed, and then one of its SnapshotData also getting destroyed but having takeSnapshot called on it and trying to reassign its _usage map. Potentially the _usage map was already destroyed, and its heap memory was freed, but the reassignment attempts to free the heap memory again. Stack traces that might be related to this found in:

FREE-3600
~~SERVER-2695~~
~~SERVER-4190~~

It looks like SnapshotThread::run checks inShutdown(), but a shutdown occurs after after the inShutdown() check but before or during the call to takeSnapshot() the same double free might occur as in the unclean shutdown cases.

Maybe when an an immediate exit occurs during a shutdown, global objects required for shutdown may be destroyed while still in use?

~~SERVER-3869~~

Here it kind of looks like there is a global freed or left in a bad state and then another exit call attempts to free it again.

CS-501

Mystery failure after a clean shutdown

~~SERVER-414~~ (very old mongo version)

On mongos these failures may potentially have occurred in similar situations:

~~SERVER-3082~~
~~SERVER-4367~~
CS-1903
~~SERVER-2930~~
FREE-3696
~~SERVER-4576~~

I would recommend that we do a closer examination of 2-5 above, do an audit for additional cases, and then fix all known cases.

Assignee:: [DO NOT USE] Backlog - Storage Execution Team
Reporter:: Aaron Staple (Inactive)
Participants:: [DO NOT USE] Backlog - Storage Execution Team, Aaron Staple
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Jan 03 2012 10:27:24 PM UTC
Updated:: Dec 06 2022 05:37:53 AM UTC
Resolved:: Nov 15 2016 08:49:24 PM UTC

Details

Description

Attachments

Activity

People

Dates