[SERVER-71213] High variance in PriorityTicketHolder microbenchmark runs Created: 09/Nov/22  Updated: 27/Oct/23  Resolved: 11/Nov/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Haley Connelly Assignee: Jordi Olivares Provencio
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-71476 Disparity between SemaphoreTicketHold... Closed
Sprint: Execution Team 2022-11-14
Participants:

 Description   

Currently, we are seeing a case where the ticketholder_bm benchmark performance of the PriorityTicketHolder degrades significantly on outlier runs. In general, the PriorityTicketHolder microbenchmarks perform comparable to that of the SemaphoreTicketHolder, consistently surpassing the SemaphoreTicketHolder performance with benchmark runs of 1024 threads.

After performing some analysis, we discovered that outlier runs with poor performance were spending a significant amount of time (~29% compared to ~2% in standard runs) in _pthread_rwlock_rdlock. 

We suspect this could be due to the use of the shared_mutex, and sub-optimal waits for shared readers. 



 Comments   
Comment by Haley Connelly [ 16/Nov/22 ]

daniel.gomezferro@mongodb.com you make a great point, and thanks for meeting offline to investigate this further.

Conclusion
The microbenchmarks are not indicative of the actual PriorityTicketHolder performance in a highly contentious workload.

Recommendation
We should try to fix the microbenchmark to be more representative or re-evaluate the impact of how frequently operations "skip the line" in the PriorityTicketHolder.

Analysis
We compared the number normal operations "addedToQueue" versus "startedProcessing" to get an idea of how many operations queue in the 1024k benchmark workloads (run locally on a m6i.2xlarge instance).
Average queued:processed ratios in benchmarks

  • 1 : 2 in the SemaphoreTicketHolder benchmark
  • 1 : 500 in the PriorityTicketHolder benchmark

Additionally, we had performance data from the background_index_construction genny workload which aims to test index build performance when there is write contention. To make the comparison accurate, we compared the SemaphoreTicketHolder performance to the PriorityTicketHolder's performance when index builds are normal priority by default (only normal priority operations were run in the test).
Average queued:processed ratios in contentious performance workload

  • 1 : 2 in the SempahoreTicketHolder run
  • 1 : ~3.5 in the PriorityTicketHolder run

Theory
We are already aware that incoming operations may grab the next available ticket, "skip the line", despite there being operations queued (SERVER-70391). We believe the microbenchmarks today increase the likelihood of "line skipping" since threads who release a ticket immediately go back to reacquire.

Comment by Jordi Olivares Provencio [ 11/Nov/22 ]

That very well might be the case. Having a bit of contention with the locks might explain it. It would align with the pthread reader lock jumping dramatically in time spent there.

Comment by Daniel Gomez Ferro [ 11/Nov/22 ]

Maybe you don't even need to get enqueued, it could just be a case of taking the lock almost always un contended VS having some contention, and it appears there's pretty low contention overall with the low number of enqueues.

Comment by Jordi Olivares Provencio [ 10/Nov/22 ]

This is interesting, but doesn't seem to be the case unfortunately:

BM_acquireAndRelease<PriorityTicketHolder>/threads:16          45.8 ns          662 ns      1038752 Acquired=1.51068M/s AcquiredPerThread=94.4176k/s Enqueued=0
BM_acquireAndRelease<PriorityTicketHolder>/threads:128         6.53 ns          632 ns      1128832 Acquired=1.58339M/s AcquiredPerThread=12.3703k/s Enqueued=0
BM_acquireAndRelease<PriorityTicketHolder>/threads:1024        1646 ns        32513 ns       102400 Acquired=30.7574k/s AcquiredPerThread=30.0365/s Enqueued=210 
 
-------
 
BM_acquireAndRelease<PriorityTicketHolder>/threads:16          46.3 ns          672 ns      1036112 Acquired=1.48738M/s AcquiredPerThread=92.9611k/s Enqueued=0
BM_acquireAndRelease<PriorityTicketHolder>/threads:128         5.89 ns          625 ns      1106560 Acquired=1.59926M/s AcquiredPerThread=12.4942k/s Enqueued=0
BM_acquireAndRelease<PriorityTicketHolder>/threads:1024         183 ns          990 ns       535552 Acquired=1009.64k/s AcquiredPerThread=985.979/s Enqueued=307

I obtained the number for Enqueued from the ticketholder statistics for normal queue which is where all operations are going. It raises a very interesting question though, which is why are we enqueueing so little in the benchmark

Comment by Daniel Gomez Ferro [ 10/Nov/22 ]

Are you monitoring how many times you ended up in the queues? A problem you might be running into is that for fast cases you are (almost) never queueing (a thread takes the ticket, sleeps, releases and reacquires without contention) vs slow cases were you end up queueing and dequeuing a lot.

Generated at Thu Feb 08 06:18:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.