[SERVER-37811] Replication rollback invalidates all sessions with retryable writes, not just the rolled-back ones Created: 30/Oct/18  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: 4.0.3, 4.1.4
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: former-quick-wins, gm-ack
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-36493 Invalidate in-memory prepared transac... Closed
is related to SERVER-29531 Handle Rollbacks in SessionTransactio... Closed
is related to SERVER-30281 Properly clear in-memory transaction ... Closed
Assigned Teams:
Replication
Sprint: Repl 2019-02-11
Participants:

 Description   

The rollback handling code here seems to have the exact list of the sessions, which rolled-back, but despite this it invalidates all the sessions in the catalog.



 Comments   
Comment by Pavithra Vetriselvan [ 13/Feb/19 ]

In OpObserverImpl::onReplicationRollback, we end up calling MongoDSessionCatalog::invalidateSessions and pass in boost::none for a single session doc. This causes us to go through and invalidate all sessions

Since we already have the rollbackSessionIds, which tracks sessions where operations were rolled back, we should be able to iterate over this and call invalidateSessions with each session ID.

Alternatively, we could do the iterations inside invalidateSessions if we pass in the rollbackSessionIds as a parameter.

Comment by Gregory McKeon (Inactive) [ 05/Nov/18 ]

Putting this here for now to revisit once prepare w/ rollback decides when we need to invalidate the sessions table.

Comment by Jack Mulrow [ 30/Oct/18 ]

judah.schvimer, yeah like Randolph said I think we did it this way because roll back to a checkpoint didn't track rolled back sessions at the time and it wasn't worthwhile to add that ourselves since we didn't support retryable writes against nodes that aren't primary. It looks like we started tracking affected sessions in SERVER-29933 though, so it shouldn't be hard to only invalidate them now.

Comment by Randolph Tan [ 30/Oct/18 ]

I think this was for rollback to checkpoint. Since we can't tell which what's the diff, we'll have to force the in memory sessions to just load everything from storage.

Comment by Judah Schvimer [ 30/Oct/18 ]

This was done in SERVER-30281 and SERVER-29531 . jack.mulrow and renctan, was this done for a reason?

Generated at Thu Feb 08 04:47:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.