Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53199

Transient transaction errors updating config collections on the coordinator aborts the resharding operation

    • Fully Compatible
    • ALL
    • Sharding 2020-12-14
    • 0
    • 1

      [js_test:test_resharding_test_fixture] 2020-12-03T05:07:05.978+0000 c20274| {"t":{"$date":"2020-12-03T05:07:05.978+00:00"},"s":"D4", "c":"TXN",      "id":23984,   "ctx":"ReshardingCoordinatorService-1","msg":"New transaction started","attr":{"txnNumber":0,"lsid":{"uuid":{"$uuid":"49a91bea-9729-4cc9-8816-3fb5d526cf72"}}}}
      [js_test:test_resharding_test_fixture] 2020-12-03T05:07:05.986+0000 c20274| {"t":{"$date":"2020-12-03T05:07:05.986+00:00"},"s":"I",  "c":"TXN",      "id":51802,   "ctx":"ReshardingCoordinatorService-1","msg":"transaction","attr":{"parameters":{"lsid":{"id":{"$uuid":"49a91bea-9729-4cc9-8816-3fb5d526cf72"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"txnNumber":0,"autocommit":false,"readConcern":{"provenance":"implicitDefault"}},"readTimestamp":"Timestamp(0, 0)","keysExamined":0,"docsExamined":0,"terminationCause":"aborted","timeActiveMicros":4347,"timeInactiveMicros":2960,"numYields":0,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":4}},"Global":{"acquireCount":{"r":1,"w":2}},"Database":{"acquireCount":{"w":2}},"Collection":{"acquireCount":{"w":3}},"Mutex":{"acquireCount":{"r":5}}},"storage":{},"wasPrepared":false,"durationMillis":7}}
      ...
      [js_test:test_resharding_test_fixture] 2020-12-03T05:07:06.008+0000 c20274| {"t":{"$date":"2020-12-03T05:07:06.007+00:00"},"s":"I",  "c":"COMMAND",  "id":4956902, "ctx":"ReshardingCoordinatorService-0","msg":"Resharding failed","attr":{"namespace":"reshardingDb.coll","newShardKeyPattern":{"newKey":1.0},"error":{"code":112,"codeName":"WriteConflict","errmsg":"WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction."}}}
      

      ShardingCatalogManager::withTransaction() is used internally by resharding to run multi-document transactions across config.chunks, config.collections, and config.reshardingOperations. ShardingCatalogManager::withTransaction() should probably automatically retry on transient transaction errors.

      This issue has become more likely now that recipients are concurrent writers to the document in config.reshardingOperations when reporting their state changes to the coordinator.

            Assignee:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: