Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-54460

Resharding may delete the state document before fully completing

    • Fully Compatible
    • ALL
    • v4.9
    • Hide

      1. change the src/mongo/db/repl/primary_only_service_op_observer.cpp PrimaryOnlyServiceOpObserver::onDelete() to release the service with error:

      service->releaseInstance(
      documentId,
      Status(ErrorCodes::Interrupted,
      str::stream() << "State document " << documentId << " is dropped",
      BSON("documentId" << documentId)));

      2. run the test:

      buildscripts/resmoke.py run --suite=sharding --repeat 1 --mongodSetParameters="

      { featureFlagTenantMigrations: true}

      " jstests/sharding/api_params_nontransaction_sharded.js

      it will fail with:
      Error: command failed: {
      ...
      "errmsg" : "State document

      { _id: UUID(\"2e1c206c-d618-4f8c-ba0f-247637bea29c\") }

      is dropped",
      ...

      Show
      1. change the src/mongo/db/repl/primary_only_service_op_observer.cpp PrimaryOnlyServiceOpObserver::onDelete() to release the service with error: service->releaseInstance( documentId, Status(ErrorCodes::Interrupted, str::stream() << "State document " << documentId << " is dropped", BSON("documentId" << documentId))); 2. run the test: buildscripts/resmoke.py run --suite=sharding --repeat 1 --mongodSetParameters=" { featureFlagTenantMigrations: true} " jstests/sharding/api_params_nontransaction_sharded.js it will fail with: Error: command failed: { ... "errmsg" : "State document { _id: UUID(\"2e1c206c-d618-4f8c-ba0f-247637bea29c\") } is dropped", ...
    • Sharding 2021-05-03, Sharding 2021-05-17
    • 2

      I do not claim that this issue can cause actual production failures, but it was a real problem for me blocking from fully implementing SERVER-53950.

      The idea I was trying to implement in SERVER-53950 was that we should always interrupt the primary service instance whenever we unregister it. One of the things that unregisters the service is the deletion of the state document.

      However if I make this bridge as discussed in that bug, the resharding fails at the moment the state doc is deleted, before completion. I don't see a simple fix myself.

            Assignee:
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Reporter:
            andrew.shuvalov@mongodb.com Andrew Shuvalov (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: