[SERVER-83303] Attempt to read whole message in one go in sourceMessage Created: 15/Nov/23  Updated: 05/Feb/24

Status: In Code Review
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Erin McNulty Assignee: Erin McNulty
Resolution: Unresolved Votes: 0
Labels: perf-8.0, perf-tiger, perf-tiger-handoff
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Service Arch
Sprint: Service Arch 2024-01-08, Service Arch 2024-01-22, Service Arch 2024-02-05, Service Arch 2024-02-19
Participants:

 Description   

In SERVER-81784, Mathias recommended that we optimize the sync networking path on linux by passing MSG_WAITALL at all times for sinkMessage. He also recommended this improvement for sourceMessage, which I split into this ticket because of its additional complexity:

While we are at it, we should make the sourceMessage a bit more optimal for small messages as well. Currently we do a recv of 16 bytes to read the header, then do another recv to read the rest of the message. Instead we should allocate a buffer on the stack (maybe 1, 4, or 16KB?) and do a recv into that, only looping until we have the size (in general we won't loop at all). If we got lucky and got a full message on our first try, we can just copy that into a Message and move on without doing a second syscall. If we didn't get a full message then we should to a recv(MSG_WAITALL) for the remainder.

As noted in a discussion on slack : "I think this is a 'strict win' for us - we either do a 'big read' and avoid doing the second syscall, or we have the 'normal behavior' where we read the header (and then some) in one syscall, and then do a second to get the rest."

After discussing with the team, we noted that for exhaust commands, we may run into a situation where this initial read reads in multiple messages at once. In this case, we would need to parse the contents of the buffer to return one message from sourceMessage, and stash the remaining messages to be returned on future calls to sourceMessage.



 Comments   
Comment by Mathias Stearn [ 28/Nov/23 ]

Re updated psuedo-code, you also need to handle the case where you don't have enough of the message ready yet to know the full message size. Probably can just add a loop around the initial recv until you have enough bytes. Just make sure to update the ptr and length.

Comment by Erin McNulty [ 27/Nov/23 ]

Essentially what we need to write is something like this in sourceMessage:
if sync:
  recv into buffer of ~1024 bytes
  if messageSize < 1024:
    check if there is more than one message in the buffer-- if there is and we have one or more additional complete messages, stash those messages because they are part of an exhaust query or a fire and forget command
  else:
    recv(...., MSG_WAITALL) // get the rest of the message
 
along with creating a mechanism for stashing the overflow messages and deciding on what behavior we should use if we have read, for example, 1.5 messages (although the simplest behavior is probably to just do a recv(MSG_WAITALL) on the last message). 

Comment by Mathias Stearn [ 16/Nov/23 ]

Note that it isn't just for exhaust replies but also fire-and-forget requests. At the OP_MSG layer both are indicated with the kMoreToCome bit being set, and I think both go through sourceMessage(), but we should ensure that if there is any asymmetry in how they are handled, both cases work correctly.

Generated at Thu Feb 08 06:51:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.