[SERVER-55669] [SBE][replica_sets] Hang because rollback id won't increment Created: 31/Mar/21 Updated: 15/Apr/21 Resolved: 15/Apr/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Query Execution, Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Kyle Suarez | Assignee: | Justin Seyster |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Sprint: | Query Execution 2021-04-19 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
For both rollback_prepare_transaction.js and recover_prepared_transaction_state.js, the RollbackTest is stuck in a state where the rollback id doesn't get past 1:
There's some hang analyzer analysis in the attached logs. |
| Comments |
| Comment by Justin Seyster [ 15/Apr/21 ] |
|
I verified locally that both rollback_prepare_transaction.js and recover_prepared_transaction_state.js pass now that Doing some more digging, I see that the exception that TransactionHistoryIterator was throwing does eventually get caught and logged here: It would be nicer to log it in the same place that other rollback errors are logged, but I don't have an immediate plan as to how to go about that. |
| Comment by Kyle Suarez [ 12/Apr/21 ] |
|
If switching from throwing to returning to a Status would improve the error message, I think that's definitely worth doing, as the failure is rather obscure from the logs. But if there isn't an easy way to improve this failure mode's readability, then I'd be fine with closing this as a duplicate of |
| Comment by Justin Seyster [ 12/Apr/21 ] |
|
This failure is caused by the NotPrimaryOrSecondary exception described in
It looks like that line of code is expecting _findCommonPoint() to return an error Status rather than throw an exception. To help diagnose problems of this nature in the future, perhaps we should investigate how to make _findCommonPoint() report all its errors the same way (either with a Status or an Exception). |