Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-32554

Source shard stepdown while entering critical section can trigger cloner invariant

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6.3, 3.7.1
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.6
    • Sprint:
      Sharding 2018-01-15
    • Linked BF Score:
      0

      Description

      A stepdown during MigrationSourceManager::enterCriticalSection can trigger the cleanupOnError scope guard and eventually call MigrationSourceManager::_cleanup. This function std::moves the manager's clone driver into a local variable so it is destructed when the function exits, but it calls two functions before calling cancelClone on the cloneDriver (which puts it into state kDone), and if either of them throws (which ShardServerCatalogCacheLoader::waitForCollectionFlush can if the node's replication role changes), the invariant in the clone driver's destructor fails, because it will still be in state kCloning.

      I think the fix would be to either move the cancelClone call earlier in _cleanup, or put it in a scope guard declared after _cloneDriver is extracted into a local variable.

        Attachments

          Activity

            People

            Assignee:
            jack.mulrow Jack Mulrow
            Reporter:
            jack.mulrow Jack Mulrow
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: