[SERVER-24113] OplogFetcher getMore callback QueryResponseStatus should include metadata on error Created: 09/May/16  Updated: 06/Dec/22  Resolved: 18/Apr/22

Status: Closed
Project: Core Server
Component/s: Networking, Replication
Affects Version/s: None
Fix Version/s: Needs Further Definition

Type: Improvement Priority: Major - P3
Reporter: Judah Schvimer Assignee: Backlog - Replication Team
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-24088 oplog fetcher should retry on getMore... Closed
Related
is related to SERVER-23222 Replset metadata is not attached to c... Closed
is related to SERVER-24067 TaskExecutor RemoteCommandCallbackArg... Closed
Assigned Teams:
Replication
Participants:

 Description   

On ExceededTimeLimit errors in the getMore callback of the OplogFetcher, we potentially still want to apply the metadata even on failure. Currently the QueryResponseStatus is a statusWith, which cannot both return the status and a value that includes metadata.



 Comments   
Comment by Lingzhi Deng [ 19/Apr/22 ]

We decided to close this as "Won't Do" because it isn't very clear what the benefit would be. SERVER-24088 seems to suggest that this would help with commit point propagation. But we already have this logic for the sync source to emit an empty patch to expedite commit point propagation.

Comment by Lingzhi Deng [ 11/Apr/22 ]

This wasn't solved as part of the Exhaust Cursors for Oplog Fetching project. We do use the rpc::ReplyMetadataReader callback in DBClientConnection. But we only cache the metadata there but not process it until _onSuccessfulBatch is called. To do this ticket, maybe we can process the cached metadata on certain errors in the error path.

Comment by Lingzhi Deng [ 07/Jan/20 ]

The Exhaust Cursors for Oplog Fetching is planning to use the rpc::ReplyMetadataReader callback in DBClientConnection to handle metadata. Based on this, I think this might come for free because the callback is called on every response (even on failure status). Maybe we can make this ticket depend on the project and so we will remember to revisit to see if there is anything else to do after.

Comment by Judah Schvimer [ 06/Jan/20 ]

I'm changing this to an improvement since it doesn't seem like there is any incorrect behavior as a result, just maybe less than ideal behavior.

ldeng, is this still a desired improvement after Exhaust Cursors for Oplog Fetching? Do you know what more we'd want to do here after that project?

Comment by Mira Carey [ 23/May/16 ]

Per discussion with scotthernandez, I've expanded SERVER-24067 to include metadata and I'm passing this ticket back to replication

Comment by Mira Carey [ 23/May/16 ]

scotthernandez,

That makes sense and I think I've got a good grasp of what you're looking for in SERVER-24067. What I'm wondering is what else we need for this ticket. Is there somewhere else you're looking to have us thread elapsedMS after the task executor work is done?

Comment by Scott Hernandez (Inactive) [ 20/May/16 ]

To be more concise, the executor (network_interface) returns a StatusWith<RemoteCommandResponse> where the RemoteCommandResponse has the metadata, and elapsedMS (see SERVER-24067) and we need those in the case that there is an error, when there is no RemoteCommandResponse.

We can add more context if that isn't enough.

Comment by Scott Hernandez (Inactive) [ 09/May/16 ]

This may be a problem for replication due to how tailing the oplog works wrt this not returning metadata on TimeOutExceeded errors due to maxTimeMS command processing, and therefore liveness information is being delayed (since a new find query must be issued) or lost on this return path.

Also, we should address SERVER-24067 at the same time, since it should be peer data being returned along the same code-paths.

Generated at Thu Feb 08 04:05:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.