Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-89730

All commands that take more than one collection lock must acquire locks in same order

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • Fully Compatible
    • ALL
    • CAR Team 2025-01-20

      We have several commands that operate on more than one collection and therefore have to take more than one collection lock at the same time. It's important that these commands take collection locks in the same order, otherwise the server can deadlock.

      For example, cloneCollectionAsCapped takes collection locks in the order "fromNs" and then "toNs" -

      AutoGetCollection autoColl(opCtx, fromNs, MODE_X);
      Lock::CollectionLock collLock(opCtx, toNs, MODE_X);
      

      This means that if a concurrent cloneCollectionAsCapped command runs with arguments in the reverse order, it can cause the server to deadlock (the two cloneCollectionAsCapped cmds will deadlock with each other - I verified this locally).

      Typically we sort collection locks to make sure that they're always acquired in the same order. Unfortunately it doesn't seem like we always sort on the same property - for example renameCollection sorts the collections on ResourceID before acquiring locks, to prevent deadlocks with itself. But this means it may deadlock with cloneCollectionAsCapped (haven't verified this).

      And again this sort is different from what the ShardingDDLCoordinator uses, which sorts on the namespace string rather than ResourceID.

      There may be other sorting styles being used as well. We should ensure that we sort the same way, maybe by providing a Storage Execution API for grabbing multiple collection locks.

            Assignee:
            josef.ahmad@mongodb.com Josef Ahmad
            Reporter:
            vishnu.kaushik@mongodb.com Vishnu Kaushik
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: