[SERVER-53199] Transient transaction errors updating config collections on the coordinator aborts the resharding operation Created: 03/Dec/20  Updated: 29/Oct/23  Resolved: 04/Dec/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: PM-234-M2, PM-234-T-lifecycle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-51088 Create ReshardingFixture class for re... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2020-12-14
Participants:
Linked BF Score: 0
Story Points: 1

 Description   

[js_test:test_resharding_test_fixture] 2020-12-03T05:07:05.978+0000 c20274| {"t":{"$date":"2020-12-03T05:07:05.978+00:00"},"s":"D4", "c":"TXN",      "id":23984,   "ctx":"ReshardingCoordinatorService-1","msg":"New transaction started","attr":{"txnNumber":0,"lsid":{"uuid":{"$uuid":"49a91bea-9729-4cc9-8816-3fb5d526cf72"}}}}
[js_test:test_resharding_test_fixture] 2020-12-03T05:07:05.986+0000 c20274| {"t":{"$date":"2020-12-03T05:07:05.986+00:00"},"s":"I",  "c":"TXN",      "id":51802,   "ctx":"ReshardingCoordinatorService-1","msg":"transaction","attr":{"parameters":{"lsid":{"id":{"$uuid":"49a91bea-9729-4cc9-8816-3fb5d526cf72"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"txnNumber":0,"autocommit":false,"readConcern":{"provenance":"implicitDefault"}},"readTimestamp":"Timestamp(0, 0)","keysExamined":0,"docsExamined":0,"terminationCause":"aborted","timeActiveMicros":4347,"timeInactiveMicros":2960,"numYields":0,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":4}},"Global":{"acquireCount":{"r":1,"w":2}},"Database":{"acquireCount":{"w":2}},"Collection":{"acquireCount":{"w":3}},"Mutex":{"acquireCount":{"r":5}}},"storage":{},"wasPrepared":false,"durationMillis":7}}
...
[js_test:test_resharding_test_fixture] 2020-12-03T05:07:06.008+0000 c20274| {"t":{"$date":"2020-12-03T05:07:06.007+00:00"},"s":"I",  "c":"COMMAND",  "id":4956902, "ctx":"ReshardingCoordinatorService-0","msg":"Resharding failed","attr":{"namespace":"reshardingDb.coll","newShardKeyPattern":{"newKey":1.0},"error":{"code":112,"codeName":"WriteConflict","errmsg":"WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction."}}}

ShardingCatalogManager::withTransaction() is used internally by resharding to run multi-document transactions across config.chunks, config.collections, and config.reshardingOperations. ShardingCatalogManager::withTransaction() should probably automatically retry on transient transaction errors.

This issue has become more likely now that recipients are concurrent writers to the document in config.reshardingOperations when reporting their state changes to the coordinator.



 Comments   
Comment by Githook User [ 04/Dec/20 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-53199 Auto retry on TTE in ShardingCatalogManager::withTxn().

Also adds a "failLocalClients" option to the failCommand failpoint.
Branch: master
https://github.com/mongodb/mongo/commit/2851094f12d05f96fabe023abb9062f096557c02

Generated at Thu Feb 08 05:30:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.