[SERVER-35519] NULL dereference in LogicalSessionCacheReap Created: 09/Jun/18 Updated: 23/Jul/18 Resolved: 26/Jun/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 4.1.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | A. Jesse Jiryu Davis | Assignee: | Blake Oler |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
The transactions test suite for both the C Driver and Motor are able to pretty often crash the server, version 4.1.0-263-gbb2de3700e, configured as a 3-node RS on macOS with test commands enabled:
|
| Comments |
| Comment by Kaloian Manassiev [ 26/Jun/18 ] |
|
Spoke with Jesse on email and since he was using locally built binaries which were lost, we are unable to recover the exact line of the crash and from the mangled stack trace is it not obvious which part of the command execution pipeline is nullptr. I am closing it as "cannot reproduce". jesse, since the description says "pretty often", please reopen this ticket if it happens again and make sure you save the symbols. |
| Comment by Blake Oler [ 12/Jun/18 ] |
|
jesse and I discussed this offline. He's going to turn on core dumps to give us more info the next time this crashes. In the meantime, I'm going to investigate the sequence of operations stressed by the Python driver (Motor) tests to tease out any race conditions. |
| Comment by Eric Milkie [ 11/Jun/18 ] |
|
Indeed the default LogicalSessionDefaultRefresh time is 5 minutes. |
| Comment by A. Jesse Jiryu Davis [ 11/Jun/18 ] |
|
Dammit I can't reproduce this now, sorry. I didn't have core dumps enabled at the time I observed the crashes. I only observed it with 4.1.0-263-gbb2de3700e, I haven't been testing with 4.0.0-rc0 recently. I believe this is related to me developing the driver transactions tests: I would begin a test that would open a session and start a transaction, then I would stop the test and start over. As a result, there were open sessions with transactions in progress when my test called "killAllSessions", and the server would crash then or some time afterward. I notice that the log is suspiciously close to 5 minutes long, but I don't know if there's a 5-minute timer involved or not. |
| Comment by Kaloian Manassiev [ 11/Jun/18 ] |
|
jesse, does this only impact master (4.1 codebase) or it is happening on 4.0 as well? Do you have any repro steps and/or the core dump from one of these occurrences? |