[SERVER-63989] Retry rollback_to_stable until all concurrent operations finish Created: 25/Feb/22  Updated: 29/Oct/23  Resolved: 21/Mar/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Gregory Wlodarek
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Gantt Dependency
has to be done after WT-8884 Change return code of rollback_to_sta... Closed
Related
related to SERVER-60335 Wait for all user operations to be ki... Backlog
is related to WT-8970 Check for positioned cursors before a... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.0, v4.4
Sprint: Execution Team 2022-03-21, Execution Team 2022-04-04
Participants:
Linked BF Score: 35

 Description   

The rollback_to_stable operation requires callers to"close or reset all open cursors before the call, and no other API calls should be made for the duration of the call."

If that does not happen, the operation will fail with the following error:

{"t":{"$date":"2022-01-28T01:09:22.818+00:00"},"s":"E",  "c":"WT",       "id":22435,   "ctx":"BackgroundSync","msg":"WiredTiger error message","attr":{"error":22,"message":{"ts_sec":1643332162,"ts_usec":818250,"thread":"2:0x7f13c99e3700","session_name":"txn rollback_to_stable","category":"WT_VERB_DEFAULT","category_id":9,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"int __rollback_to_stable_check(WT_SESSION_IMPL *):1391:rollback_to_stable illegal with active transactions","error_str":"Invalid argument","error_code":22}}}

We should retry rollback_to_stable until the system quiesces.



 Comments   
Comment by Githook User [ 21/Mar/22 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-63989 Retry rollback_to_stable until all concurrent operations finish
Branch: master
https://github.com/mongodb/mongo/commit/453a0ab6111a8ab1f5343cdebc76d62d80ea1f08

Comment by Githook User [ 21/Mar/22 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-63989 KeysCollectionManager is interruptible by replication rollback
Branch: master
https://github.com/mongodb/mongo/commit/b9dc9e8c8add0ef259cf9302808c087e4df5460f

Comment by Githook User [ 21/Mar/22 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-63989 Release the snapshot before pausing in the hangBeforeWaitingForWriteConcern fail point
Branch: master
https://github.com/mongodb/mongo/commit/0f5f68c816ffae7950eeb7ca8c176a3f755d889b

Comment by Keith Bostic (Inactive) [ 05/Mar/22 ]

I have no objection to changing the return code in this scenario to EBUSY, but – and I think I'm just repeating what everybody already knows, so apologies for that, but just in case – there is no guarantee the WiredTiger check will catch all MongoDB Server activity. If WiredTiger performs this check, and then a transaction starts, rollback-to-stable will be performed while the transaction runs, and bad things will happen. (In other words, WiredTiger has no global locks around running RTS, and it isn't sufficient to call RTS until it doesn't complain about transactions being run in parallel.)

Comment by Tammy Bailey (Inactive) [ 28/Feb/22 ]

We created the ticket WT-8884 to change the return code in this error scenario to EBUSY. Please let us know when you are ready to schedule this work.

Comment by Louis Williams [ 25/Feb/22 ]

This is an alternative solution to SERVER-60335.

Generated at Thu Feb 08 05:59:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.