[SERVER-52903] Add an upper bound for how long a tenant migration donor can block operations Created: 17/Nov/20  Updated: 29/Oct/23  Resolved: 14/Jan/21

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.9.0-alpha0

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Andrew Shuvalov (Inactive)
Resolution: Fixed Votes: 0
Labels: pm-1791_non-cloud-blocking, pm-1791_other_required
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-53571 Determine the best default timeout fo... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2020-12-14, Sharding 2020-12-28, Sharding 2021-01-11, Sharding 2021-01-25
Participants:

 Description   

We decided at the 6 week review that the donor should bound how long it blocks operations.

This is useful in general to limit the impact of a tenant migration on a user workload, but in particular is useful in case the donor starts blocking very soon after writing a commitIndexBuild oplog entry. This is because the recipient will not start building the index until seeing commitIndexBuild in the oplog; will not report that it has majority-committed the commitIndexBuild oplog entry until finishing building the index; and building the index can take a long time (hours).

suganthi.mani or matthew.russotto , could you confirm that this is the plan for how the recipient will handle two-phase index builds?

evin.roesle, could you propose a default upper bound for how long the donor should block operations? The upper bound should be settable via a server parameter.



 Comments   
Comment by Githook User [ 14/Jan/21 ]

Author:

{'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}

Message: SERVER-52903: Add an upper bound for how long a tenant migration donor can block operations
Branch: master
https://github.com/mongodb/mongo/commit/a102bde490e4f63a5558551f31e3c834edeb4dd6

Comment by Andrew Shuvalov (Inactive) [ 16/Dec/20 ]

I changed it back to open and will return to this ticket later, working on something else. If you think this is not needed, you have time to tell me.

Comment by Lingzhi Deng [ 15/Dec/20 ]

I don't think we need to block this ticket. Sugnathi's proposal of blocking index builds probably makes this less necessary. But it is still a good idea to have some upper bound on the critical section. We should still do the ticket as a safety valve, regardless of how we handle index build.

Comment by Andrew Shuvalov (Inactive) [ 08/Dec/20 ]

Ok, thanks suganthi.mani for heads up. I have it mostly done but I can hold.

Comment by Suganthi Mani [ 08/Dec/20 ]

FYI, andrew.shuvalov I have discussed some design approaches for handling index builds during tenant migration (see here). If we plan to go by option#3, we might not need this ticket fix for index build issue. But, this ticket might still be helpful to handle any unforeseen reasons.

Comment by Matthew Russotto [ 17/Nov/20 ]

That's correct; furthermore, oplog application on the recipient will be halted while the index builds.

Generated at Thu Feb 08 05:29:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.