Details
-
Bug
-
Resolution: Works as Designed
-
Major - P3
-
None
-
None
-
Replication
-
ALL
-
Description
I've made a stress test for https://jira.mongodb.org/browse/SERVER-53457 and it hits a problem that aborted migration cannot be repeated because after the first cycle the receiver now contains the database that was migrated and fails the retry. The integration test is attached.
The stress test is running 2 threads: 1st thread is repeatedly sending a non-idempotent multi-update to the donor, 2nd thread is looping through repeated tenant migrations, aborting each migration using fail point `abortTenantMigrationAfterBlockingStarts`.
After the first migration is aborted the donor sends the recipient the abort request:
[js_test:tenant_migration_multi_writes] 2021-01-27T21:43:42.353+0000 d20022| {"t":{"$date":"2021-01-27T21:43:42.353+00:00"},"s":"I", "c":"REPL", "id":4881400, "ctx":"conn15","msg":"Forgetting migration due to recipientForgetMigration command","attr":{"migrationId":{"uuid":{"$uuid":"172ff6b5-0611-4a63-bed5-0a887c5a9095"}},"tenantId":"testTenantId-multiWrites"}} |
|
|
[js_test:tenant_migration_multi_writes] 2021-01-27T21:43:42.356+0000 d20022| {"t":{"$date":"2021-01-27T21:43:42.356+00:00"},"s":"I", "c":"REPL", "id":4881401, "ctx":"TenantMigrationRecipientService-0","msg":"Migration marked to be garbage collectable due to recipientForgetMigration command","attr":{"migrationId":{"uuid":{"$uuid":"172ff6b5-0611-4a63-bed5-0a887c5a9095"}},"tenantId":"testTenantId-multiWrites","expireAt":{"$date":"2021-01-29T21:43:42.353Z"}}} |
On the next loop the donor tries again but the recipient fails with:
[js_test:tenant_migration_multi_writes] 2021-01-27T21:43:42.898+0000 d20022| {"t":{"$date":"2021-01-27T21:43:42.898+00:00"},"s":"D2", "c":"TENANT_M", "id":5271500, "ctx":"TenantMigrationRecipientService-1","msg":"listExistingDatabases entry","attr":{"migrationId":{"uuid":{"$uuid":"9d949264-9b87-4f78-bf11-7952dde38a4f"}},"tenantId":"testTenantId-multiWrites","db":{"name":"testTenantId-multiWrites_0"}}} |
[js_test:tenant_migration_multi_writes] 2021-01-27T21:43:42.898+0000 d20022| {"t":{"$date":"2021-01-27T21:43:42.898+00:00"},"s":"I", "c":"TENANT_M", "id":21077, "ctx":"TenantMigrationRecipientService-1","msg":"Non-retryable error occurred during cloner stage","attr":{"cloner":"TenantAllDatabaseCloner","stage":"listExistingDatabases","error":{"code":48,"codeName":"NamespaceExists","errmsg":"Tenant 'testTenantId-multiWrites': databases already exist prior to data sync"}}} |
[js_test:tenant_migration_multi_writes] 2021-01-27T21:43:42.898+0000 d20022| {"t":{"$date":"2021-01-27T21:43:42.898+00:00"},"s":"I", "c":"REPL", "id":4878501, "ctx":"TenantMigrationRecipientService-1","msg":"Tenant migration recipient instance: Data sync completed.","attr":{"tenantId":"testTenantId-multiWrites","migrationId":{"uuid":{"$uuid":"9d949264-9b87-4f78-bf11-7952dde38a4f"}},"error":{"code":48,"codeName":"NamespaceExists","errmsg":"Tenant 'testTenantId-multiWrites': databases already exist prior to data sync"}}} |
the donor receives the error, using my custom log:
[js_test:tenant_migration_multi_writes] 2021-01-27T21:43:42.898+0000 d20020| !!!! migration p4, error NamespaceExists Tenant migration recipient command failed :: caused by :: Tenant 'testTenantId-multiWrites': databases already exist prior to data sync, threads !!!!! idle threads 3 for pool TenantMigrationDonorServiceThreadPool |
Unfortunately this error is not properly logged by default at donor, which I will fix in my CR.