[SERVER-77057] Investigate slow down caused by OpObservers in updateMany code path Created: 11/May/23  Updated: 27/Jul/23  Resolved: 27/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Irina Yatsenko (Inactive) Assignee: Kshitij Gupta
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-77758 Break down query analysis op observer... Closed
depends on SERVER-77364 Speed up OpObservers with a filter fr... Closed
Related
related to SERVER-77373 Use OpStateAccumulator's to cache com... Backlog
related to SERVER-77364 Speed up OpObservers with a filter fr... Closed
Assigned Teams:
Sharding NYC
Operating System: ALL
Participants:

 Description   

While investigating https://jira.mongodb.org/browse/PERF-3818 have identified a regression under mongo::OpObserverRegistry::onUpdate (mongo::OpObserverRegistry::onUpdate: 10.17% in 7.0 but 5.11% in 6.0 while 6.0 containing fewer samples). Partly it's due to adding a new observer (QueryAnalysisOpObserver), partly because more work done in ClusterServerParameterOpObserver and OpObserverImpl, and partly due to an increased overhead across all observers.

Should look into:

  • what is the shared overhead? (could it be due to replacing cached nss with virtual coll->ns() call?)
  • could any of the observers be trimmed or skipped in a more efficient way for updateMany?

 

I've added local std::chrono-based logging in the loop of OpObserverRegistry::onUpdate and saw the following:

v6.0 Observer v7.0 Observer Comment
100.64 AuthOpObserver 173.72 AuthOpObserver Calls 'AuthorizationManagerImpl::logOp' and 'audit::logUpdateOperation'
53.1 ClusterServerParameterOpObserver 105.66 ClusterServerParameterOpObserver Moved updateParameter to recoveryUnit
53.86 FcvOpObserver 64.9 FcvOpObserver No changes between releases. Does little.
447.16 OpObserverImpl 618.46 OpObserverImpl Multiple changes
55.2 PrimaryOnlyServiceOpObserver 47.24 PrimaryOnlyServiceOpObserver Does nothing on update in both releases
56.4 ShardSplitDonorOpObserver 111.78 QueryAnalysisOpObserver New in 7.0. The design seems wrong for updateMulti
59.38 TenantMigrationDonorOpObserver 70.24 TenantMigrationDonorOpObserver Checks whether nss is the one it cares about and does nothing if not
58.66 TenantMigrationRecipientOpObserver 56.66 TenantMigrationRecipientOpObserver Checks whether nss is the one it cares about and does nothing if not
93.78 UserWriteBlockModeOpObserver 79.92 UserWriteBlockModeOpObserver Calls '_checkWriteAllowed()' then checks whether nss is the one it cases about and does nothing if not
978.18   1328.58    

The numbers are nanoseconds per updated doc, averaged across updateMany that touched 50K docs. The last row is the sum across all observers. In my local tests the sum for 6.0 fluctuated between 900 and 1000. Notice, that even the observers that haven't changed and do very little in the scenario of this benchmark (essentially, a guard check or two before bailing out), such as TenantMigrationDonorOpObserver still are generally a little slower in 7.0. More contention on the instruction cache, etc.? 



 Comments   
Comment by Kshitij Gupta [ 02/Jun/23 ]

I used google benchmark to evaluate the performance of QueryAnalysisOpObserver, and my changes cut down the time spent in this OpObserver by ~50%.

 
------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------
With changes:
BM_QueryAnalysisOpObserver/1 14.5 ns 14.5 ns 48061568
Without changes:
BM_QueryAnalysisOpObserver/1 28.5 ns 28.5 ns 24595548
 
SERVER-77758 will be tracking this work.

Comment by Matt Kneiser [ 01/Jun/23 ]

The prior code review for SERVER-77057 should stack with SERVER-77364. I hesitate to dive too deep on further investigation before these tickets can both be measured against the regression because they should yield the easiest improvements.

Comment by Matt Kneiser [ 01/Jun/23 ]

I am linking this ticket to SERVER-77364 which should provide some speedup in one of the hottest functions in the linked ticket: OpObserverRegistry::onUpdate. It should also impact onInserts and potentially onDelete. The PR is in review and I am profiling the change at the moment to see what improvement in can provide.

Comment by Irina Yatsenko (Inactive) [ 15/May/23 ]

judah.schvimer@mongodb.com, there was a lot of churn in op_observer_impl.cpp between 6.0 and 7.0. At the function level, ideally, we should have micro-benchmarks to see the impact of individual changes, from a code review I would suspect a few that might have contributed to the longer running time (but keep in mind that stuff gets inlined, etc.):

  1. args.nss became rgs.coll->ns() - the latter is a virtual call and it's done multiple times
  2. the addition of inBatchedWrite block (this benchmark executes the final "else" clause but it has to compute the flag)
  3.  it looks like replLogUpdate became slightly heavier

I tried caching nss on OplogUpdateEntryArgs to remove the virtual calls. It saved ~100 nanoseconds per call (about half of the regression). I have no idea whether caching like this would be correct.

Generated at Thu Feb 08 06:34:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.