[SERVER-43099] Reenable random chunk migration failpoint for concurrency with_balancer suites Created: 30/Aug/19  Updated: 29/Nov/23  Resolved: 07/Feb/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.3.0-rc0

Type: Task Priority: Major - P3
Reporter: Alexander Taskov (Inactive) Assignee: Jordi Serra Torrens
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-43106 Test create_index_background_wildcard... Closed
depends on SERVER-43196 Blacklist update_where.js from sharde... Closed
depends on SERVER-61742 Balancer may trip invariants due to c... Closed
depends on SERVER-61769 Attempting to run an aggregation with... Closed
depends on SERVER-62710 AsyncRequestsMerger won't attempt to ... Closed
depends on SERVER-73032 SBE plan cache prevents range deletio... Closed
depends on SERVER-73385 RenameCollectionCoordinator wrongly r... Closed
depends on SERVER-73388 RenameCollection can attempt to relea... Closed
depends on SERVER-42914 Implement random chunk selection poli... Closed
depends on SERVER-61759 Unsetting the AllowMigrations flag sh... Closed
depends on SERVER-61810 rename_sharded_collection.js workload... Closed
depends on SERVER-61811 Do not run indexed_insert_where.js on... Closed
depends on SERVER-61835 Fix how SBE plan cache deals with Sha... Closed
depends on SERVER-61840 create_index_background_partial_filte... Closed
Gantt Dependency
Problem/Incident
causes SERVER-74235 Create compound indexes before shardi... Closed
causes SERVER-81517 blacklist validate_db_metadata_comman... Closed
causes SERVER-73973 [test-only bug] Skip orphans checking... Closed
Related
related to SERVER-43107 Disable random chunk migration failpo... Closed
is related to SERVER-73434 Investigate failures on multi_stateme... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2019-12-16, Sharding 2019-12-30, Sharding 2020-01-13, Sharding EMEA 2021-11-29, Sharding EMEA 2021-12-13, Sharding EMEA 2021-12-27, Sharding EMEA 2022-01-10, Sharding EMEA 2022-05-30, Sharding EMEA 2022-09-05, Sharding EMEA 2022-09-19, Sharding EMEA 2022-10-03, Sharding EMEA 2022-10-17, Sharding EMEA 2022-10-31, Sharding EMEA 2022-11-14, Sharding EMEA 2022-11-28, Sharding EMEA 2022-12-12, Sharding EMEA 2022-12-26, Sharding EMEA 2023-01-09, Sharding EMEA 2023-01-23, Sharding EMEA 2023-02-06
Participants:
Linked BF Score: 113

 Description   

The random migration failpoints have possible exposed various failures and will be disabled until we investigate the BFs that were encountered. Once we have resolved these issues, the failpoint should be reenabled.



 Comments   
Comment by Githook User [ 07/Feb/23 ]

Author:

{'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}

Message: SERVER-43099 Reenable random chunk migration failpoint for concurrency with_balancer suites
Branch: master
https://github.com/mongodb/mongo/commit/9fb9d210409c00e69ec40179330fee0b28f62aec

Comment by Silvia Surroca [ 01/Jun/22 ]

Given that no pre-splitting is happening anymore and the balancer is now taking into consideration data size, the only possible "random" migration that can happen is moving back and forward one chunk for tests working with few or no data.

Also, random move chunk workloads are anyway testing the same use case.

For now, linking the ticket to PM-2652 because the "random chunk migration" test behaviour should be rethink after the future new balancer implementation.

Comment by Jordi Serra Torrens [ 13/Jan/22 ]

In addition to the failures mentioned above, there's still one test failure pending to be understood:

  • jstests/concurrency/fsm_workloads/server_status_with_time_out_cursors.js (in this patch)

Understood ^: The test leaves idle cursors opens. This makes the stopBalancer/checkOrphans checks at the end of the fixture hang, because the rangeDeleter cannot progress due to the open cursors. SERVER-62710 tracks this.

Comment by Blake Oler [ 17/Oct/19 ]

Note that when we re-enable the random balancer policy, we may have to create a way for workloads to opt out of the randomness, such as the workload in SERVER-44062, which will assert that the balancer works properly.

Generated at Thu Feb 08 05:02:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.