[SERVER-62245] MigrationRecovery must not assume that only one migration needs to be recovered Created: 23/Dec/21 Updated: 29/Oct/23 Resolved: 30/Dec/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 5.0.0, 5.2.0, 5.1.0 |
| Fix Version/s: | 5.3.0, 5.1.2, 5.0.6, 5.2.0-rc4 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Jordi Serra Torrens | Assignee: | Jordi Serra Torrens |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v5.2, v5.1, v5.0
|
||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding EMEA 2021-12-27, Sharding EMEA 2022-01-10 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||
| Description |
|
Issue and status as of Dec 30, 2021 ISSUE DESCRIPTION AND IMPACT This issue can cause unavailability of a shard in sharded clusters running MongoDB versions 5.0.0 - 5.0.5 and 5.1.0 - 5.1.1. Next versions are not affected. The problem can potentially occur if all of the following conditions have been met at least once:
Symptom of the bug: mongod process crashing upon step-up due to an invariant failure with the following message: "Upon step-up a second migration coordinator was found". REMEDIATION AND WORKAROUNDS
TECHNICAL DETAILS Migration coordinators:
Range deletion tasks:
--- Original ticket description --- There are several situations that can lead to more than one migration (for different collections) needing recovery on stepup. For example, when a migration fails here we only clear the collection's filtering metadata so that the next access to the collection will trigger the recovery, and then release the ActiveMigrationRegistry. At this point, nothing prevents a migration to a different collection from starting, so now if the shard stepped down it would have two migrations to recover. This invariant along with taking the MigrationBlockingGuard on stepup migration recovery was added on
This ticket will provide a fix so that clusters that are already in the faulty situation of having several migrations pending to be recover don't hit the invariant on stepup anymore. |
| Comments |
| Comment by Githook User [ 28/Jun/22 ] |
|
Author: {'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}Message: |
| Comment by Githook User [ 30/Dec/21 ] |
|
Author: {'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}Message: (cherry picked from commit 8e6ab9a259d921298940190161fadfd118c6dc15) |
| Comment by Githook User [ 30/Dec/21 ] |
|
Author: {'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}Message: (cherry picked from commit 8e6ab9a259d921298940190161fadfd118c6dc15) |
| Comment by Githook User [ 30/Dec/21 ] |
|
Author: {'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}Message: (cherry picked from commit 8e6ab9a259d921298940190161fadfd118c6dc15) |
| Comment by Githook User [ 30/Dec/21 ] |
|
Author: {'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}Message: |
| Comment by Tommaso Tocci [ 23/Dec/21 ] |
|
This bug has been introduced by |