[SERVER-53168] Support 50 concurrent migrations on a single recipient Created: 01/Dec/20 Updated: 29/Oct/23 Resolved: 17/Feb/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Lingzhi Deng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | pm-1791_milestone-B | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Repl 2021-02-08, Repl 2021-02-22 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Currently our tenant migration recipient thread pool default size is 8 (and it’s a tunable server startup parameter). For each migration, we have components, like oplog fetcher & cloner, on recipient side that would do some synchronous job (fetching data from remote donor node) on the tenant migration recipient thread , without yielding the thread. With the default thread pool size as 8, we can expect only at most 3 concurrent migration to be initiated on recipient side (per migration, 2 threads for sync jobs + 1 thread for async job),. Otherwise, concurrent tenant migration can lead to complete stalling of all active tenant migrations on recipient side. Consider the case, say, tenant migration recipient thread pool size is 4. So, now, we have no free worker threads left in the tenant migration recipient thread pool to start the cloner. All 4 tenant migrations would hang on recipient side until we cancel one migration explicitly using ForgetMigration cmd. |
| Comments |
| Comment by Githook User [ 17/Feb/21 ] |
|
Author: {'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}Message: |
| Comment by Andrew Shuvalov (Inactive) [ 05/Feb/21 ] |
|
Added |
| Comment by Lingzhi Deng [ 18/Dec/20 ] |
|
In our discussion with Cloud, we decided to support <= 50 concurrent migrations on a single recipient set (at least for private beta). And we will revisit this and refactor the OplogFetcher and the cloners code if needed. So I think we can change the thread pool to be 150 for now. |
| Comment by Suganthi Mani [ 01/Dec/20 ] |
|
I think, we need a solution, something like, throttle the migration at the command layer on the recipient side (i.e) Make recipeintSyncData cmd to wait if there are already 3 active concurrent migration in progress (for a default thread pool size 8) before asking POS to start a new migration |