Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45191

Data race in CollectionShardingState::getCriticalSectionSignal

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Gone away
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Sharding
    • Operating System:
      ALL
    • Sprint:
      Sharding EMEA 2021-07-26, Sharding EMEA 2021-08-09

      Description

      CollectionShardingState::getCriticalSectionSignal does not take or enforce any locks, and ShardingMigrationCriticalSection::getSignal does not do any synchronization either.

      When entering the critical section in migration, we use an X lock, but when exiting (which calls .reset() on the signal), we only use an IX lock.

      Furthermore, in setShardVersion, when we call getCriticalSectionSignal we're only holding an IS lock on the collection which does not conflict with the IX lock held in exitCriticalSection, so there's no synchronization on reading/writing the shared_ptr for the critical section signal. The same is true in _flushRoutingTableCacheUpdates.

      Fortunately in the normal path for inspecting the critical section we use a shared lock on the CSR which conflicts with the exclusive lock taken on the CSR when we exit the critical section.

        Attachments

          Activity

            People

            Assignee:
            backlog-server-sharding-emea Backlog - Sharding EMEA
            Reporter:
            matthew.saltz Matthew Saltz
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: