[SERVER-77057] Investigate slow down caused by OpObservers in updateMany code path Created: 11/May/23 Updated: 27/Jul/23 Resolved: 27/Jul/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Irina Yatsenko (Inactive) | Assignee: | Kshitij Gupta |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Sharding NYC
|
||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
While investigating https://jira.mongodb.org/browse/PERF-3818 have identified a regression under mongo::OpObserverRegistry::onUpdate (mongo::OpObserverRegistry::onUpdate: 10.17% in 7.0 but 5.11% in 6.0 while 6.0 containing fewer samples). Partly it's due to adding a new observer (QueryAnalysisOpObserver), partly because more work done in ClusterServerParameterOpObserver and OpObserverImpl, and partly due to an increased overhead across all observers. Should look into:
I've added local std::chrono-based logging in the loop of OpObserverRegistry::onUpdate and saw the following:
The numbers are nanoseconds per updated doc, averaged across updateMany that touched 50K docs. The last row is the sum across all observers. In my local tests the sum for 6.0 fluctuated between 900 and 1000. Notice, that even the observers that haven't changed and do very little in the scenario of this benchmark (essentially, a guard check or two before bailing out), such as TenantMigrationDonorOpObserver still are generally a little slower in 7.0. More contention on the instruction cache, etc.? |
| Comments |
| Comment by Kshitij Gupta [ 02/Jun/23 ] |
|
I used google benchmark to evaluate the performance of QueryAnalysisOpObserver, and my changes cut down the time spent in this OpObserver by ~50%. |
| Comment by Matt Kneiser [ 01/Jun/23 ] |
|
The prior code review for |
| Comment by Matt Kneiser [ 01/Jun/23 ] |
|
I am linking this ticket to |
| Comment by Irina Yatsenko (Inactive) [ 15/May/23 ] |
|
judah.schvimer@mongodb.com, there was a lot of churn in op_observer_impl.cpp between 6.0 and 7.0. At the function level, ideally, we should have micro-benchmarks to see the impact of individual changes, from a code review I would suspect a few that might have contributed to the longer running time (but keep in mind that stuff gets inlined, etc.):
I tried caching nss on OplogUpdateEntryArgs to remove the virtual calls. It saved ~100 nanoseconds per call (about half of the regression). I have no idea whether caching like this would be correct. |