[SERVER-59923] Retry reshardCollection command from background thread in ReshardingTest fixture Created: 13/Sep/21  Updated: 29/Oct/23  Resolved: 15/Oct/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.2.0, 5.0.4, 5.1.0-rc1

Type: Task Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: PM-234-M3, PM-234-T-fuzzer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test_resharding_fixture_test_shutdown_retry_needed.js    
Issue Links:
Backports
Depends
is depended on by SERVER-53351 Add resharding fuzzer task with step-... Closed
Problem/Incident
Related
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.1, v5.0
Sprint: Sharding 2021-09-20, Sharding 2021-10-04, Sharding 2021-10-18
Participants:
Linked BF Score: 162
Story Points: 1

 Description   

Mongos will retry the _shardsvrReshardCollection command on the primary shard of the database only a finite number of times. If the current primary of that replica set is shut down (or steps down and goes into rollback) more than the number of retries from mongos, then the command error response will cause the test to fail.


[js_test:resharding_fuzzer-120e1-1630670495216-2] d20021| 2021-09-03T12:09:07.675+00:00 I  REPL     21358   [BackgroundSync] "Replica set state transition","attr":{"newState":"ROLLBACK","oldState":"SECONDARY"}
...
[js_test:resharding_fuzzer-120e1-1630670495216-2] s20032| 2021-09-03T12:09:07.679+00:00 D1 ASSERT   23074   [conn46] "User assertion","attr":{"error":"InterruptedDueToReplStateChange: operation was interrupted","file":"src/mongo/s/commands/cluster_reshard_collection_cmd.cpp","line":80}
...
[js_test:resharding_fuzzer-120e1-1630670495216-2] s20032| 2021-09-03T12:09:07.680+00:00 D1 ASSERT   23074   [conn46] "User assertion","attr":{"error":"InterruptedDueToReplStateChange: operation was interrupted","file":"src/mongo/util/future_impl.h","line":1087}
[js_test:resharding_fuzzer-120e1-1630670495216-2] s20032| 2021-09-03T12:09:07.680+00:00 D1 SHARDING 22772   [conn46] "Exception thrown while processing command","attr":{"db":"admin","headerId":5142,"error":"InterruptedDueToReplStateChange: operation was interrupted"}
[js_test:resharding_fuzzer-120e1-1630670495216-2] s20032| 2021-09-03T12:09:07.681+00:00 I  COMMAND  51803   [conn46] "Slow query","attr":{"type":"command","ns":"test_reshard.reshard_coll","appName":"MongoDB Shell","command":{"reshardCollection":"test_reshard.reshard_coll","key":{"recipient":1,"slot":1},"_presetReshardedChunks":[{"min":{"recipient":{"$minKey":1},"slot":{"$minKey":1}},"max":{"recipient":"recipient0","slot":{"$minKey":1}},"recipientShardId":"shard0"},{"min":{"recipient":"recipient0","slot":{"$minKey":1}},"max":{"recipient":"recipient0","slot":10},"recipientShardId":"shard0"},{"min":{"recipient":"recipient0","slot":10},"max":{"recipient":"recipient0","slot":20},"recipientShardId":"shard0"},{"min":{"recipient":"recipient0","slot":20},"max":{"recipient":"recipient0","slot":30},"recipientShardId":"shard0"},{"min":{"recipient":"recipient0","slot":30},"max":{"recipient":"recipient0","slot":40},"recipientShardId":"shard0"},{"min":{"recipient":"recipient0","slot":40},"max":{"recipient":"recipient1","slot":{"$minKey":1}},"recipientShardId":"shard0"},{"min":{"recipient":"recipient1","slot":{"$minKey":1}},"max":{"recipient":"recipient1","slot":10},"recipientShardId":"shard1"},{"min":{"recipient":"recipient1","slot":10},"max":{"recipient":"recipient1","slot":20},"recipientShardId":"shard1"},{"min":{"recipient":"recipient1","slot":20},"max":{"recipient":"recipient1","slot":30},"recipientShardId":"shard1"},{"min":{"recipient":"recipient1","slot":30},"max":{"recipient":"recipient1","slot":40},"recipientShardId":"shard1"},{"min":{"recipient":"recipient1","slot":40},"max":{"recipient":"recipient2","slot":{"$minKey":1}},"recipientShardId":"shard1"},{"min":{"recipient":"recipient2","slot":{"$minKey":1}},"max":{"recipient":"recipient2","slot":10},"recipientShardId":"shard2"},{"min":{"recipient":"recipient2","slot":10},"max":{"recipient":"recipient2","slot":20},"recipientShardId":"shard2"},{"min":{"recipient":"recipient2","slot":20},"max":{"recipient":"recipient2","slot":30},"recipientShardId":"shard2"},{"min":{"recipient":"recipient2","slot":30},"max":{"recipient":"recipient2","slot":40},"recipientShardId":"shard2"},{"min":{"recipient":"recipient2","slot":40},"max":{"recipient":"recipient3","slot":{"$minKey":1}},"recipientShardId":"shard2"},{"min":{"recipient":"recipient3","slot":{"$minKey":1}},"max":{"recipient":{"$maxKey":1},"slot":{"$maxKey":1}},"recipientShardId":"shard0"}],"lsid":{"id":{"$uuid":"702b58ee-193c-41f9-b0ae-342688d9eced"}},"$db":"admin"},"numYields":0,"ok":0,"errMsg":"operation was interrupted","errName":"InterruptedDueToReplStateChange","errCode":11602,"reslen":241,"readConcern":{"level":"local","provenance":"implicitDefault"},"remote":"10.122.50.122:34602","protocol":"op_msg","durationMillis":34486}
...
[js_test:resharding_fuzzer-120e1-1630670495216-2] 	"errmsg" : "operation was interrupted",
[js_test:resharding_fuzzer-120e1-1630670495216-2] 	"codeName" : "HostUnreachable",

https://evergreen.mongodb.com/lobster/build/bd7546f0b83c3529ec1dda233b4d926a/test/61320ff1c2ab687bbc0802b6#bookmarks=0%2C65494%2C65582%2C65590%2C65591%2C65592%2C94518%2C94520%2C104193&f~=000~%22operation%20was%20interrupted%22&f~=000~%5C%5BResharding.%2AService&f~=000~%5C%5BReshardingCoordinatorService&f~=000~20032%5C%7C&f~=100~d2002%5B012%5D%5C%7C&l=1



 Comments   
Comment by Githook User [ 15/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-59923 Retry on another error code in ReshardingTest fixture.

Changes the ReshardingTest fixture to additionally retry the
reshardCollection command on FailedToSatisfyReadPreference error
responses from mongos when elections are enabled.

Also removes the shouldSetMinVisibleToOldestOnStartup constructor
parameter to the ReshardingTest fixture now that the
setMinVisibleForAllCollectionsToOldestOnStartup failpoint has been
backported to the earlier branches.

(cherry picked from commit 257cf738d1d0fa3ec73446133dae8f6b5510b2c4)
Branch: v5.0
https://github.com/mongodb/mongo/commit/0d369af9f5be858fb06ed7c8013e05365288dcb0

Comment by Githook User [ 14/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-59923 Retry on another error code in ReshardingTest fixture.

Changes the ReshardingTest fixture to additionally retry the
reshardCollection command on FailedToSatisfyReadPreference error
responses from mongos when elections are enabled.

Also removes the shouldSetMinVisibleToOldestOnStartup constructor
parameter to the ReshardingTest fixture now that the
setMinVisibleForAllCollectionsToOldestOnStartup failpoint has been
backported to the earlier branches.

(cherry picked from commit 257cf738d1d0fa3ec73446133dae8f6b5510b2c4)
Branch: v5.1
https://github.com/mongodb/mongo/commit/64caafeb373ddfacccb4129c41a4e13bd22b7aa4

Comment by Githook User [ 14/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-59923 Retry on another error code in ReshardingTest fixture.

Changes the ReshardingTest fixture to additionally retry the
reshardCollection command on FailedToSatisfyReadPreference error
responses from mongos when elections are enabled.

Also removes the shouldSetMinVisibleToOldestOnStartup constructor
parameter to the ReshardingTest fixture now that the
setMinVisibleForAllCollectionsToOldestOnStartup failpoint has been
backported to the earlier branches.
Branch: master
https://github.com/mongodb/mongo/commit/257cf738d1d0fa3ec73446133dae8f6b5510b2c4

Comment by Githook User [ 13/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-59923 Retry reshardCollection command in ReshardingTest fixture.

Also adds a setMinVisibleForAllCollectionsToOldestOnStartup failpoint to
enable resharding tests to read from the cloneTimestamp after server
restarts without needing to wait for the creation timestamp of the
source sharded collection to advance past the oldest_timestamp.

(cherry picked from commit 0bde7934c623efd194747f65b2e711e188b7c108)
Branch: v5.0
https://github.com/mongodb/mongo/commit/016c23a524285df45da1dbffadf5a524885bb6fa

Comment by Githook User [ 13/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-59923 Retry reshardCollection command in ReshardingTest fixture.

Also adds a setMinVisibleForAllCollectionsToOldestOnStartup failpoint to
enable resharding tests to read from the cloneTimestamp after server
restarts without needing to wait for the creation timestamp of the
source sharded collection to advance past the oldest_timestamp.

(cherry picked from commit 0bde7934c623efd194747f65b2e711e188b7c108)
Branch: v5.1
https://github.com/mongodb/mongo/commit/9983b378a87226065da66195816b88843ae6bb78

Comment by Githook User [ 12/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-59923 Retry reshardCollection command in ReshardingTest fixture.

Also adds a setMinVisibleForAllCollectionsToOldestOnStartup failpoint to
enable resharding tests to read from the cloneTimestamp after server
restarts without needing to wait for the creation timestamp of the
source sharded collection to advance past the oldest_timestamp.
Branch: master
https://github.com/mongodb/mongo/commit/0bde7934c623efd194747f65b2e711e188b7c108

Generated at Thu Feb 08 05:48:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.