[SERVER-81784] Pass MSG_WAITALL to send/recv when doing sync networking Created: 03/Oct/23 Updated: 24/Jan/24 Resolved: 20/Nov/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Erin McNulty |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | perf-8.0, perf-tiger, perf-tiger-handoff, perf-tiger-poc, perf-tiger-q4, perf-tiger-triaged | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Service Arch
|
||||
| Sprint: | Service Arch 2023-11-13, Service Arch 2023-11-27 | ||||
| Participants: | |||||
| Description |
|
This tells the kernel not to bother waking user-space until the whole message has been sent/received. Without it, the kernel wakes us each time there is a partially complete send/recv just so that we can call send/recv again with an advanced buffer slice. findOne latency for a 1MB document went from ~750μs to ~550μs, so that is a lot of wasted work. Unfortunately asio doesn't know to set it, so we will need to bypass asio and do it ourselves. While we are at it, we should make the sourceMessage a bit more optimal for small messages as well. Currently we do a recv of 16 bytes to read the header, then do another recv to read the rest of the message. Instead we should allocate a buffer on the stack (maybe 1, 4, or 16KB?) and do a recv into that, only looping until we have the size (in general we won't loop at all). If we got lucky and got a full message on our first try, we can just copy that into a Message and move on without doing a second syscall. If we didn't get a full message then we should to a recv(MSG_WAITALL) for the remainder. sinkMessage is simpler and we can just unconditionally use MSG_WAITALL when sync because we always have a full message ready to go. Note, this will only show an improvement for large messages, with the impact proportional to the size of the message. So be sure to run benchmarks that involve very large messages. |
| Comments |
| Comment by Erin McNulty [ 20/Nov/23 ] |
|
Discussed with mathias@mongodb.com , MSG_WAITALL is not actually checked when passed in `send`, and the tasks of investigating it with `recv` and investigating the effects of MSG_ZEROCOPY have been split into their own tickets. We are still curious why his POC showed an improvement (my patches showed varying results, mostly unchanged perf on a non-tls variant from implementing this), but are closing this because this would be a non-TLS only improvement anyways, and so we decided it wasn't worth further investigation beyond what has already been done. |
| Comment by Erin McNulty [ 15/Nov/23 ] |
|
Filed SERVER-83303 to address sourceMessage because of additional complications discovered for exhaust commands, and SERVER-83304 to investigate MSG_ZEROCOPY. |
| Comment by Mathias Stearn [ 03/Oct/23 ] |
|
Could also investigate using MSG_ZEROCOPY with large messages to keep the Message object alive until the kernel knows it won't need it while allowing the kernel to avoid doing its own copies. They claim it is only beneficial for messages >10KB. (Let me know if I should file a separate ticket for this or if you want to look into it as part of this work) |