[SERVER-63092] Return retryable errors to caller, and improve split testing infrastructure to wrap retryable calls in retry loops Created: 28/Jan/22  Updated: 29/Oct/23  Resolved: 14/Apr/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Matt Broadstone Assignee: Didier Nadeau
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Server Serverless 2022-04-18
Participants:

 Description   

Improve the ShardSplitDonorService to return retryable errors to the caller. Improve the testing infrastructure to use retry loops for shard split commands, and handle retryable errors. Add tests proving that retryable errors during commitShardSplit are indeed retryable.



 Comments   
Comment by Githook User [ 12/Apr/22 ]

Author:

{'name': 'Didier Nadeau', 'email': 'didier.nadeau@mongodb.com', 'username': 'nadeaudi'}

Message: SERVER-63092 Improve split test library and add tests for write
Branch: master
https://github.com/mongodb/mongo/commit/cff8825855ae567667d3834861433f8f70303b84

Comment by Didier Nadeau [ 15/Mar/22 ]

List of current `tenant_migration_` jstests and whether they should be adapted for shard split (i.e. if they're applicable to shard split). All tenant_migration_recipient are skipped as they don't apply. `1` means applicable, `0` not applicable, `?` not sure, and comments are added if needed.

 tenant_migration_abort_forget_retry.js,1 
 tenant_migration_aborted_buildindex.js,? index build completes after aborting migration, 
 tenant_migration_advance_stable_ts_after_clone.js,0 
 tenant_migration_blocking_state_timeout.js,1 copy 
 tenant_migration_buildindex.js, ? index 
 tenant_migration_buildindex_shard_merge.js, ? index 
 tenant_migration_causal_consistency_commit_optime_before_last_cloning_optime.js,0 recipient 
 tenant_migration_cloner_stats.js,0 
 tenant_migration_cloner_stats_with_failover.js,0 
 tenant_migration_clones_system_views.js,0 
 tenant_migration_cloning_uses_read_concern_majority.js,0 
 tenant_migration_cluster_time_keys_cloning.js,0 
 tenant_migration_collection_rename.js,1 ? -> migration abort when a donor collection is renamed. Whill this be prevented by access blockers ? 
 tenant_migration_collection_ttl.js, 0 don't disable ttl explicitely : ttl in tenant collections ? 
 tenant_migration_commit_transaction_retry.js, 0 no recipient 
 tenant_migration_concurrent_bulk_writes.js, 1 adapt 
 tenant_migration_concurrent_migrations.js, 0 concurrent shard split not supported -> write test for that 
 tenant_migration_concurrent_migrations_recipient.js, 0
 tenant_migration_concurrent_migrations_stress_test.js, 0
 tenant_migration_concurrent_reads_on_donor.js, 1
 tenant_migration_concurrent_reads_on_recipient.js, 0
 tenant_migration_concurrent_reconfig.js, 1
 tenant_migration_concurrent_state_doc_removal_and_stepdown.js, 1 what's the goal ?
 tenant_migration_concurrent_writes_on_donor.js, 1
 tenant_migration_concurrent_writes_on_recipient.j, 0
 tenant_migration_conflicting_donor_start_migration_cmds.js, N/A already unit-tested
 tenant_migration_conflicting_recipient_sync_data_cmds.js, 0
 tenant_migration_donor_abort_state_transition.js, ?
 tenant_migration_donor_current_op.js, 1
 tenant_migration_donor_initial_sync_recovery.js, 0 -> avoid by preventing reconfig
 tenant_migration_donor_interrupt_on_stepdown_and_shutdown.js, 1
 tenant_migration_donor_kill_op_retry.js, 0 -> unit test
 tenant_migration_donor_resume_on_stepup_and_restart.js, 1 (stepup is unit tested, test start ?)
 tenant_migration_donor_retry.js, 0 not retryable
 tenant_migration_donor_rollback_during_cloning.js, 0 (check recipients ops are rolled back)
 tenant_migration_donor_rollback_recovery.js, 0 no rollback
 tenant_migration_donor_shutdown_while_blocking_reads.js, 1
 tenant_migration_donor_startup_recovery.js, 1
 tenant_migration_donor_state_machine.js, 0 unit tested
 tenant_migration_donor_try_abort.js, 0 already tested
 tenant_migration_donor_unblock_reads_and_writes_on_completion.js, 1
 tenant_migration_donor_wont_retry_recipientsyncdata_on_non_retriable_interruption_errors.js, 0 no sync
 tenant_migration_drop_collection.js, 1
 tenant_migration_drop_state_doc_collection.js, 1
 tenant_migration_ensure_migration_outcome_visibility_for_blocked_writes.js, 1
 tenant_migration_external_cluster_validation.js, 0 -> no explicit cluster time sync
 tenant_migration_external_keys_ttl.js, 0
tenant_migration_fetch_committed_transactions.js, 0 no explicit fetch -> test for transaction during split ?
 tenant_migration_fetch_committed_transactions_retry.js, 0 no explicit fetch
 tenant_migration_filters_tenant_id.js, 1
 tenant_migration_find_and_modify_retry.js, 0 no prefetcher
 tenant_migration_ignore_create_index_on_nonempty_collection.js, ?
 tenant_migration_index_oplog_entries.js, ?
 tenant_migration_invalid_inputs.js, 1
 tenant_migration_large_txn.js, 0, no apply oplog
 tenant_migration_logs.js, 0 no certificate
 tenant_migration_metrics_output.js, 1 ftdc/server status
 tenant_migration_multi_writes.js, 0 -> cannot do repeated split easily
 tenant_migration_multikey_index.js, ?
 tenant_migration_network_error_via_rollback.js, 0 -> network error handling during sync
 tenant_migration_no_failover.js, 0 -> already have a full split test
 tenant_migration_on_clustered_collection.js, 1
 tenant_migration_oplog_view.js, 0
 tenant_migration_read_your_own_writes.js, 1
 tenant_migration_resume_collection_cloner_after_recipient_failover.js, 0
 tenant_migration_resume_collection_cloner_after_recipient_failover_with_dropped_views.js, 0
 tenant_migration_resume_collection_cloner_after_rename.jsi, 0
 tenant_migration_resume_oplog_application.js, 0
 tenant_migration_retry_session_migration.js, 0
 tenant_migration_retryable_write_retry.js, 1
 tenant_migration_retryable_write_retry_on_recipient.js, 1
 tenant_migration_shard_merge_import_write_conflict_retry.js
 tenant_migration_ssl_configuration.js, 0
 tenant_migration_stepup_recovery_after_abort.js, 0 (unit tested)
 tenant_migration_sync_source_too_stale.js, 0
 tenant_migration_test_max_bson_limit.js, 1
 tenant_migration_timeseries_collections.js, 1
 tenant_migration_timeseries_retryable_write_oplog_cloning.js, 0
 tenant_migration_timeseries_retryable_write_retry_on_recipient.js, 1
 tenant_migration_transaction_boundary.js, 0
 tenant_migration_v1_id_index.js, 0
 tenant_migration_x509.js, 0

Generated at Thu Feb 08 05:56:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.