Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53199

Transient transaction errors updating config collections on the coordinator aborts the resharding operation

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      Sharding 2020-12-14
    • Linked BF Score:
      0
    • Story Points:
      1

      Description

      [js_test:test_resharding_test_fixture] 2020-12-03T05:07:05.978+0000 c20274| {"t":{"$date":"2020-12-03T05:07:05.978+00:00"},"s":"D4", "c":"TXN",      "id":23984,   "ctx":"ReshardingCoordinatorService-1","msg":"New transaction started","attr":{"txnNumber":0,"lsid":{"uuid":{"$uuid":"49a91bea-9729-4cc9-8816-3fb5d526cf72"}}}}
      [js_test:test_resharding_test_fixture] 2020-12-03T05:07:05.986+0000 c20274| {"t":{"$date":"2020-12-03T05:07:05.986+00:00"},"s":"I",  "c":"TXN",      "id":51802,   "ctx":"ReshardingCoordinatorService-1","msg":"transaction","attr":{"parameters":{"lsid":{"id":{"$uuid":"49a91bea-9729-4cc9-8816-3fb5d526cf72"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"txnNumber":0,"autocommit":false,"readConcern":{"provenance":"implicitDefault"}},"readTimestamp":"Timestamp(0, 0)","keysExamined":0,"docsExamined":0,"terminationCause":"aborted","timeActiveMicros":4347,"timeInactiveMicros":2960,"numYields":0,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":4}},"Global":{"acquireCount":{"r":1,"w":2}},"Database":{"acquireCount":{"w":2}},"Collection":{"acquireCount":{"w":3}},"Mutex":{"acquireCount":{"r":5}}},"storage":{},"wasPrepared":false,"durationMillis":7}}
      ...
      [js_test:test_resharding_test_fixture] 2020-12-03T05:07:06.008+0000 c20274| {"t":{"$date":"2020-12-03T05:07:06.007+00:00"},"s":"I",  "c":"COMMAND",  "id":4956902, "ctx":"ReshardingCoordinatorService-0","msg":"Resharding failed","attr":{"namespace":"reshardingDb.coll","newShardKeyPattern":{"newKey":1.0},"error":{"code":112,"codeName":"WriteConflict","errmsg":"WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction."}}}
      

      ShardingCatalogManager::withTransaction() is used internally by resharding to run multi-document transactions across config.chunks, config.collections, and config.reshardingOperations. ShardingCatalogManager::withTransaction() should probably automatically retry on transient transaction errors.

      This issue has become more likely now that recipients are concurrent writers to the document in config.reshardingOperations when reporting their state changes to the coordinator.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              max.hirschhorn Max Hirschhorn
              Reporter:
              max.hirschhorn Max Hirschhorn
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: