[SERVER-66618] Ensure ReshardingCoordinator has aborted in resharding_coordinator_recovers_abort_decision.js Created: 20/May/22  Updated: 29/Oct/23  Resolved: 23/May/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 6.0.0-rc7, 5.0.10, 5.3.3, 6.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Backport Requested:
v6.0, v5.3, v5.0
Sprint: Sharding NYC 2022-05-30
Participants:
Linked BF Score: 35
Story Points: 1

 Description   

The reshardingPauseCoordinatorBeforeBlockingWrites failpoint managed by the ReshardingTest fixture is unset by the time the abortReshardCollection command is run by the parallel shell. The lack of synchronization in the test permits the ReshardingCoordinator to accidentally commit the resharding operation before the abortReshardCollection command is actually processed.

[js_test:resharding_coordinator_recovers_abort_decision] c20526| 2022-05-11T11:15:45.248+00:00 I  RESHARD  5343001 [ReshardingCoordinatorService-3] "Transitioned resharding coordinator state","attr":{"newState":"committing","oldState":"blocking-writes","namespace":"reshardingDb.coll","collectionUUID":{"uuid":{"$uuid":"0ddedc31-d4f6-4185-ac4e-b260b9d6b305"}},"reshardingUUID":{"uuid":{"$uuid":"a4584faa-5ef3-4e69-88b8-cb2d78d0f6d7"}}}
...
[js_test:resharding_coordinator_recovers_abort_decision] s20529| 2022-05-11T11:15:47.601+00:00 I  COMMAND  51803   [conn35] "Slow query","attr":{"type":"command","ns":"reshardingDb.coll","appName":"MongoDB Shell","command":{"abortReshardCollection":"reshardingDb.coll","lsid":{"id":{"$uuid":"d7a45c69-044f-4776-869b-033474386570"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1652267742,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin"},"numYields":0,"ok":0,"errMsg":"Could not find resharding-related metadata that matches the given namespace","errName":"NoSuchReshardCollection","errCode":339,"reslen":299,"readConcern":{"level":"local","provenance":"implicitDefault"},"remote":"127.0.0.1:35324","protocol":"op_msg","durationMillis":22}

https://evergreen.mongodb.com/lobster/build/3597a2a8b5bbd1db8ad100a6302632b1/test/627b9a76be07c41d6705496a#bookmarks=0%2C5672%2C6107%2C16975&f~=000~%5C%5BResharding.%2AService&f~=100~abortReshardCollection&l=1



 Comments   
Comment by Githook User [ 07/Jun/22 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-66618 Wait in test for resharding coordinator to persist abort.

Fixes an issue in the resharding_coordinator_recovers_abort_decision.js
test where the reshardingPauseCoordinatorBeforeBlockingWrites failpoint
is being released too early and allowing the resharding coordinator to
decide to commit the resharding operation instead.

(cherry picked from commit 18ec8376222bd7afe8485441af2c3aba3130ea2e)
Branch: v5.3
https://github.com/mongodb/mongo/commit/32762963363d127ae187965b8ab59eea8603f5dd

Comment by Liubov Molchanova [ 07/Jun/22 ]

Requesting backport as the failure reproduced on v5.3 in BFG-1195243, BFG-1195241

Comment by Githook User [ 23/May/22 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-66618 Wait in test for resharding coordinator to persist abort.

Fixes an issue in the resharding_coordinator_recovers_abort_decision.js
test where the reshardingPauseCoordinatorBeforeBlockingWrites failpoint
is being released too early and allowing the resharding coordinator to
decide to commit the resharding operation instead.

(cherry picked from commit 18ec8376222bd7afe8485441af2c3aba3130ea2e)
Branch: v6.0
https://github.com/mongodb/mongo/commit/71f9c400074406c2439dc654677de4dcc2612a82

Comment by Githook User [ 23/May/22 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-66618 Wait in test for resharding coordinator to persist abort.

Fixes an issue in the resharding_coordinator_recovers_abort_decision.js
test where the reshardingPauseCoordinatorBeforeBlockingWrites failpoint
is being released too early and allowing the resharding coordinator to
decide to commit the resharding operation instead.

(cherry picked from commit 18ec8376222bd7afe8485441af2c3aba3130ea2e)
Branch: v5.0
https://github.com/mongodb/mongo/commit/165751764bcb84855340920016d023c67a347499

Comment by Githook User [ 20/May/22 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-66618 Wait in test for resharding coordinator to persist abort.

Fixes an issue in the resharding_coordinator_recovers_abort_decision.js
test where the reshardingPauseCoordinatorBeforeBlockingWrites failpoint
is being released too early and allowing the resharding coordinator to
decide to commit the resharding operation instead.
Branch: master
https://github.com/mongodb/mongo/commit/18ec8376222bd7afe8485441af2c3aba3130ea2e

Generated at Thu Feb 08 06:05:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.