[SERVER-76848] [false alarm] $out does not ensure the node remains primary throughout the internal rename Created: 04/May/23 Updated: 29/Oct/23 Resolved: 11/Jul/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 7.1.0-rc0, 6.0.6, 5.0.17, 4.4.21, 7.0.0-rc1 |
| Fix Version/s: | 7.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Gil Alon | Assignee: | Silvia Surroca |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | shardingemea-qw | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Sharding EMEA
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v7.0, v6.3, v6.0, v5.0, v4.4
|
||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding EMEA 2023-07-10, Sharding EMEA 2023-07-24, QI 2023-05-15 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Story Points: | 2 | ||||||||||||||||||||||||||||||||||||
| Description |
|
The implementation of $out created a special internal rename command (InternalRenameIfOptionsAndIndexesMatchCmd). However, this command implements its own locks to avoid concurrent modifications, but there is an error in the implementation. On this line there is a call to assertIsPrimaryShardForDb, but there is no guarantee this node will remain the primary through the entire execution of $out. The usual pattern to ensure the node remains a primary is:
However, there is an existing _shardsvrRenameCollection command that already has the correct locking mechanism and ensures the database is the primary shard. We should see if we can use _shardsvrRenameCollection in $out, or we should fix $out to work with concurrent movePrimary commands. We will also need to expand our testing, since the current tests don't allow $out to be run in suites that kill the primary node and we should add movePrimary commands to the current concurrency test. This came up in ------------- [UPDATE - 8th of September 2023]: This is not a bug, movePrimary and the internal rename of $out are correctly serialized (here and here) through the check of isMovePrimaryInProgress flag. |