[SERVER-37671] Log when we are unable to kill a session due to a prepared transaction Created: 19/Oct/18  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Tess Avitabile (Inactive) Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: former-quick-wins, prepare_diagnostics, prepare_optional
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-36485 ‘killSessions’ (for one session) and ... Closed
Assigned Teams:
Replication
Participants:

 Description   

If a coordinator shard goes down, leaving participants with prepared transactions, the operator should have some way of knowing that there is transaction state to clean up, even if the transaction state does not create cache pressure or block other operations. SERVER-36499 will ensure that you can see how long a transaction has been in prepare in the currentOp output, but it would also be helpful to log when a transaction has been in prepare for "too long". We can log a message when the transaction timeout thread or the session reaper is unable to clean up a transaction because it is in prepare. The transaction timeout thread runs every 30 seconds by default and the session reaper runs every 5 minutes by default, so these log messages will not be too frequent. The log message should say how long the transaction has been in the prepare state.



 Comments   
Comment by Tess Avitabile (Inactive) [ 23/Oct/18 ]

That's a good point. For historical information, it would be useful to have this logged.

Comment by Jonathan Balsano [ 23/Oct/18 ]

tess.avitabile You may already be thinking about this but I want to call out that Monitoring doesn't retain historical information about currentOp, so it may benefit TSEs to have this logged in case it needs to be correlated with another issue?

Comment by Tess Avitabile (Inactive) [ 19/Oct/18 ]

This may not be necessary, since jonathan.balsano thinks the work in SERVER-36499 may be sufficient.

Generated at Thu Feb 08 04:46:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.