[SERVER-32170] tcmalloc segfault at time of first swap usage Created: 05/Dec/17 Updated: 06/Dec/22 Resolved: 15/Dec/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Stability |
| Affects Version/s: | 3.2.16 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Akira Kurogane | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Storage Execution
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
Whilst a shard mongod node was running it suddenly crashed with the following "Invalid access at address: 0" message in the log:
(The "17132 SSL handshake received but server is started without SSL support" messages occur all the time in this deployment, they don't play a part in the segfault.) At the time of the crash the total memory usage of mongod process was ~60% of the total RAM (300GB v.s. ~500 GB). The only other user process of note running was a monitoring agent. From FTDC metrics (graph screenshot attached below) we see that the crash was preceeded by the resident set size decreasing rapidly plus the disk device for swap (/dev/sdb) started being written to. The issue is new, but not a one-off - it happened on two different shards a couple of days apart. |
| Comments |
| Comment by Akira Kurogane [ 17/Dec/17 ] | ||||||||||||||||||||||||
|
Yes, let's close for now, hopefully a core dump will be shared from somewhere in the future. I want to add that I rule out that the zero pointer might have been returned from a malloc() now. Even though the segfault was happening underneath tcmalloc::ThreadCache::Deallocate() I considered it possible that used malloc() for somewhere and that returned a zero pointer (with a non-zero errno value) due to the memory pressure. Also there was a separate user's case that had tcmalloc::ThreadCache::Allocate() once instead of Deallocate(). But underneath both of these functions it was in either tcmalloc::SLL_PopRange() or tcmalloc::SLL_Pop() the segfault is recorded as occuring. These functions were short and simple - and they have no malloc() calls in them. So it seems it is a double-free bug, presumably reached in a race condition somewhere in or beneath __evict_page_dirty_update(). | ||||||||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 15/Dec/17 ] | ||||||||||||||||||||||||
|
Hey Akira, thanks a lot for continuing to dig on this. It sounds like without a repro there's not much the Storage team is able to do. We're closing this for now, but please do reopen if you're able to repro and/or dig up more info. | ||||||||||||||||||||||||
| Comment by Akira Kurogane [ 13/Dec/17 ] | ||||||||||||||||||||||||
|
I tried the repro again, without luck. This time I had less-compressible data so storage size of the collections was large enough for the kernel to use all of remaining RAM for filecache for certain. Then I performed a core dump of a dummy user process that created a 60G file. But the kernel handled it smoothly - no swap was used at all. Presumably 50 +GB of filecache was released very quickly. | ||||||||||||||||||||||||
| Comment by Akira Kurogane [ 12/Dec/17 ] | ||||||||||||||||||||||||
|
I've been doing some nice big heavy reproductions but haven't been able to put the kernel memory management into much congestion. It keeps on releasing memory at a steady rate and the free amount never gets near zero. Today a user reported that they disabled core dumps on another user process (which was observed to be having aborts several times a day) and they've not had any more crashes of mongod since. This is interesting because a core dump of a process using memory larger than (or at least the general size of) the OS's free amount can explain why the mongod would start being (partially) pushed out RAM in a hurry. | ||||||||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 10/Dec/17 ] | ||||||||||||||||||||||||
|
Arg, sorry, that was stupid - I mis-read Akira's comment as 'Andrew Morrow added a comment'. Leaving this on Storage, presumably for alexander.gorrod to triage since it's inside of WT. | ||||||||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 08/Dec/17 ] | ||||||||||||||||||||||||
|
Assigning to Platforms since Drew is stewing on this. | ||||||||||||||||||||||||
| Comment by Akira Kurogane [ 05/Dec/17 ] | ||||||||||||||||||||||||
|
Re-symbolized stack trace (leading directory paths trimmed for readability).
| ||||||||||||||||||||||||
| Comment by Akira Kurogane [ 05/Dec/17 ] | ||||||||||||||||||||||||
|
Graph from FTDC metrics. The last second of the captured metrics shows the first large decrease in resident set size, plus that the swap disk device is being used for the first time.
|