[SERVER-39969] Fix dumping of SessionCatalog in hang analyzer Created: 05/Mar/19 Updated: 29/Oct/23 Resolved: 09/Apr/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.10 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | William Schultz (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Repl 2019-03-25, Repl 2019-04-08, Repl 2019-04-22 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 22 | ||||||||
| Description |
|
The changes from |
| Comments |
| Comment by Githook User [ 09/Apr/19 ] | ||
|
Author: {'email': 'william.schultz@mongodb.com', 'name': 'William Schultz', 'username': 'will62794'}Message: | ||
| Comment by William Schultz (Inactive) [ 08/Apr/19 ] | ||
|
I tried to dig into the segfault issue a bit more and was able to reproduce it and capture a stack trace. gdb_session_dump_bt.txt
This comment on the bug report referenced above claims that running "set print static-members off" avoids the issue. This indeed appears to work for this case. That is, the following command does not crash when dumping the sessions:
There appears to be a patch that fixes the bug, but it is dated 2019-03-25 (~2 weeks prior to the writing of this comment), so it likely isn't merged yet. | ||
| Comment by William Schultz (Inactive) [ 08/Apr/19 ] | ||
|
When testing out the fix proposed above I ran into a new issue. It looks like, on at least one platform (RHEL 6.2 Santiago), GDB is hitting a segmentation fault when it tries to print out the value of the full _sessionId field here. I believe this manifests as a "Bad exit code -11" error as seen in the patch build log here. From some debugging on a spawned RHEL host, it appears that the issue has something to do with printing the _uid field of the LogicalSessionId type. I am not yet sure what the underlying issue is here, but I am running another patch build that disables the printing of the raw _sessionId variable to see if this fixes this problem. | ||
| Comment by William Schultz (Inactive) [ 05/Apr/19 ] | ||
|
Ok, it looks like it won't be too hard to fix this. We just need to account for the fact that fields on the TransactionParticipant object like _txnState have now been pushed inside either the ObservableState type (the _o field), or the PrivateState type (the _p field). So, for example, when extracting fields from the TransactionParticipant, we just extract fields from txnPart['_o'] or txnPart['_p'] instead of txnPart. | ||
| Comment by William Schultz (Inactive) [ 02/Apr/19 ] | ||
|
Running a patch build to see if this is broken or not. | ||
| Comment by Judah Schvimer [ 11/Mar/19 ] | ||
|
We will also consider fixing it since it seems worthwhile. | ||
| Comment by William Schultz (Inactive) [ 11/Mar/19 ] | ||
|
I think the specific issue in BF-12368 may have actually been fixed by |