[SERVER-74545] PriorityTicketHolder doesn't track operations that requeue after 500millis Created: 02/Mar/23 Updated: 18/Apr/23 Resolved: 14/Mar/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Haley Connelly | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Storage Execution
|
||||||||
| Participants: | |||||||||
| Description |
|
It could be interesting to track the number of operations that time out at 500 milliseconds in a queue, wake up, and requeue for a ticket. Motivation: It could provide insight into what conditions cause the operations to get stuck in the queue & the side effects on latency and throughput when operations must wakeup to requeue. Example: Suppose 50th percentile latency is ~500 milliseconds, do we see higher tail latencies than expected? should we reconsider the 500 milliseconds timeout? Right now, we measure the number of cumulative number operations queued in the PriorityTicketHolder at the TicketHolderWithQueueingStats level. This means, it does not take into account the number of items that must requeue. |
| Comments |
| Comment by Haley Connelly [ 03/Mar/23 ] |
|
I doubt it is the cause of too much, but given we know the PriorityTicketHolder is slightly slower due to extra concurrency synchronisation, I was wondering if 500millis isn't enough to give operations a chance when queueing is high. |
| Comment by Louis Williams [ 03/Mar/23 ] |
|
The 500ms timeout + requeue is also happening in the semaphore ticketholder, so I would be interested if this is the cause and if so, why it is more expensive. |