[SERVER-70602] Handle faulty balancerCompliant reporting by waiting for some no-op balancing rounds Created: 17/Oct/22 Updated: 29/Oct/23 Resolved: 03/Nov/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.1.1, 6.0.3, 6.2.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Silvia Surroca | Assignee: | Silvia Surroca |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | auto-reverted | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v6.1, v6.0
|
||||||||||||
| Sprint: | Sharding EMEA 2022-10-31, Sharding EMEA 2022-11-14 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 35 | ||||||||||||
| Description |
|
The tests are already double checking the collection balancerComplaint ( However, in Why recipient may report a wrong collection size?Because the actual migration of the documents is not executed atomically with the update of orphanCounter on the recipient side. So, the recipient shard would report a longer collection size than the actual one, if the request is addressed before the update of orphanCounter and after the actual migration of the documents. NOTE: The recipient clears the 'orphan' tag of the received chunk once the migration commit state is reached. Example of failure
2. At that moment there are no orphans so we don't have to wait for them to be 0. 3. Second balancerCompliant returns too early as well because the recipient has already received the documents but didn't update the orphans counter. So we return wrongly from awaitCollectionBalanced.
4. Once the migration ends we finally get the proper size on each shard
|
| Comments |
| Comment by Githook User [ 04/Nov/22 ] |
|
Author: {'name': 'Silvia Surroca', 'email': 'silvia.surroca@mongodb.com', 'username': 'silviasuhu'}Message: (cherry picked from commit 8e7978fb75cad95f864255810c655f62a0a9408d) |
| Comment by Githook User [ 04/Nov/22 ] |
|
Author: {'name': 'Silvia Surroca', 'email': 'silvia.surroca@mongodb.com', 'username': 'silviasuhu'}Message: (cherry picked from commit 8e7978fb75cad95f864255810c655f62a0a9408d) |
| Comment by Silvia Surroca [ 03/Nov/22 ] |
|
Yes, I've just created both backports
|
| Comment by Tommaso Tocci [ 03/Nov/22 ] |
|
silvia.surroca@mongodb.com is this affecting also 6.0 and 6.1? Do we need a backport? |
| Comment by Githook User [ 03/Nov/22 ] |
|
Author: {'name': 'Silvia Surroca', 'email': 'silvia.surroca@mongodb.com', 'username': 'silviasuhu'}Message: |
| Comment by xgen-buildbaron-user [ 22/Oct/22 ] |
|
Ticket re-opened due to revert. concurrency_sharded_with_stepdowns_and_balancer began a consistent failure of jstests/concurrency/fsm_workloads/collection_defragmentation.js |
| Comment by Githook User [ 22/Oct/22 ] |
|
Author: {'name': 'auto-revert-processor', 'email': 'dev-prod-dag@mongodb.com'}Message: Revert " This reverts commit 06e43e02a452ae1c4fffcffb0242b4a528bdacb4. |
| Comment by Githook User [ 21/Oct/22 ] |
|
Author: {'name': 'Silvia Surroca', 'email': 'silvia.surroca@mongodb.com', 'username': 'silviasuhu'}Message: |