Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-77057

Investigate slow down caused by OpObservers in updateMany code path

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Sharding NYC
    • ALL

      While investigating https://jira.mongodb.org/browse/PERF-3818 have identified a regression under mongo::OpObserverRegistry::onUpdate (mongo::OpObserverRegistry::onUpdate: 10.17% in 7.0 but 5.11% in 6.0 while 6.0 containing fewer samples). Partly it's due to adding a new observer (QueryAnalysisOpObserver), partly because more work done in ClusterServerParameterOpObserver and OpObserverImpl, and partly due to an increased overhead across all observers.

      Should look into:

      • what is the shared overhead? (could it be due to replacing cached nss with virtual coll->ns() call?)
      • could any of the observers be trimmed or skipped in a more efficient way for updateMany?


      I've added local std::chrono-based logging in the loop of OpObserverRegistry::onUpdate and saw the following:

      v6.0 Observer v7.0 Observer Comment
      100.64 AuthOpObserver 173.72 AuthOpObserver Calls 'AuthorizationManagerImpl::logOp' and 'audit::logUpdateOperation'
      53.1 ClusterServerParameterOpObserver 105.66 ClusterServerParameterOpObserver Moved updateParameter to recoveryUnit
      53.86 FcvOpObserver 64.9 FcvOpObserver No changes between releases. Does little.
      447.16 OpObserverImpl 618.46 OpObserverImpl Multiple changes
      55.2 PrimaryOnlyServiceOpObserver 47.24 PrimaryOnlyServiceOpObserver Does nothing on update in both releases
      56.4 ShardSplitDonorOpObserver 111.78 QueryAnalysisOpObserver New in 7.0. The design seems wrong for updateMulti
      59.38 TenantMigrationDonorOpObserver 70.24 TenantMigrationDonorOpObserver Checks whether nss is the one it cares about and does nothing if not
      58.66 TenantMigrationRecipientOpObserver 56.66 TenantMigrationRecipientOpObserver Checks whether nss is the one it cares about and does nothing if not
      93.78 UserWriteBlockModeOpObserver 79.92 UserWriteBlockModeOpObserver Calls '_checkWriteAllowed()' then checks whether nss is the one it cases about and does nothing if not
      978.18   1328.58    

      The numbers are nanoseconds per updated doc, averaged across updateMany that touched 50K docs. The last row is the sum across all observers. In my local tests the sum for 6.0 fluctuated between 900 and 1000. Notice, that even the observers that haven't changed and do very little in the scenario of this benchmark (essentially, a guard check or two before bailing out), such as TenantMigrationDonorOpObserver still are generally a little slower in 7.0. More contention on the instruction cache, etc.? 

            kshitij.gupta@mongodb.com Kshitij Gupta
            irina.yatsenko@mongodb.com Irina Yatsenko (Inactive)
            0 Vote for this issue
            18 Start watching this issue