[SERVER-44519] Label ExceededTimeLimit (262) errors with RetryableWriteError label Created: 08/Nov/19  Updated: 29/Oct/23  Resolved: 20/Nov/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.3.2

Type: New Feature Priority: Major - P3
Reporter: Emily Giurleo (Inactive) Assignee: Lingzhi Deng
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by DRIVERS-772 Investigate changes in SERVER-44519: ... Closed
Documented
is documented by DOCS-13237 Investigate changes in SERVER-44519: ... Closed
Related
is related to SERVER-40493 Make Interrupted a retryable writes e... Closed
is related to DRIVERS-651 Make ExceededTimeLimit retryable writ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2019-12-02
Participants:

 Description   

The Drivers team has had the intention of adding ExceededTimeLimit as a retryable error code (see SPEC-1296). Previously, this would have just required Drivers work, but now this also requires that we add the RetryableWriteError label to error to ExceededTimeLimit.



 Comments   
Comment by Githook User [ 20/Nov/19 ]

Author:

{'email': 'lingzhi.deng@mongodb.com', 'name': 'Lingzhi Deng', 'username': 'ldennis'}

Message: SERVER-44519: Label ExceededTimeLimit errors with RetryableWriteError label
Branch: master
https://github.com/mongodb/mongo/commit/b077212a48653b1668cee67f6888cd65b5d123f6

Comment by Emily Giurleo (Inactive) [ 12/Nov/19 ]

jmikola Done

Comment by Jeremy Mikola [ 12/Nov/19 ]

emily.giurleo: Noting the comments above, can you revise this ticket title/description to remove mention of ClientDisconnect and just refer to ExceededTimeLimit?

Comment by Mira Carey [ 12/Nov/19 ]

I'm happy to only add ExceededTimeLimit. It does seem like mongod would never return ClientDisconnect, and mongos would retry on ClientDisconnect itself and not want to add the error label and have drivers retry again (currently mongos is never returning RetryableWriteError error labels).

I think that'd be my preferred way forward

Comment by Judah Schvimer [ 12/Nov/19 ]

I'm happy to only add ExceededTimeLimit. It does seem like mongod would never return ClientDisconnect, and mongos would retry on ClientDisconnect itself and not want to add the error label and have drivers retry again (currently mongos is never returning RetryableWriteError error labels).

Comment by Mira Carey [ 11/Nov/19 ]

In SERVER-40493, for ClientDisconnect, Randolph Tan said:

It means that the remote connection was severed. This makes more sense when there are multiple network hops, for example, driver sends write to mongos, then mongos sending write commands to shards. So, it is possible to get this error during retryable write.

that's interesting, I wouldn't have expected that that's possible. From my memory and inspection I only see this getting manufactured after your ingress socket disconnected. So I think the only way I can see this showing up to an end user is if we persisted errors somewhere and handed them back later to a different client than the one that disconnected from us. I definitely didn't intend this to be an error you'd see outside a single mongod when I did that project

Comment by Judah Schvimer [ 11/Nov/19 ]

In SERVER-40493, for ClientDisconnect, renctan said:

It means that the remote connection was severed. This makes more sense when there are multiple network hops, for example, driver sends write to mongos, then mongos sending write commands to shards. So, it is possible to get this error during retryable write.

Comment by Mira Carey [ 11/Nov/19 ]

Just to drop a little more info in, I'm not actually sure how you'd ever get a ClientDisconnect error.

That error is sent (via markKilled), if your opctx is marked as killOnClientDisconnect and your client socket is disconnected. that bubbles up to the top and get's logged, but unless something very weird has happened, there shouldn't be an channel to propagate that error. Can't send an error response on a closed socket, and all that

Comment by Rathi Gnanasekaran [ 09/Nov/19 ]

judah.schvimer your right, according to the last comment in the spec ticket only two of those are retryable. I have updated the spec ticket to reflect the discussion.

Comment by Emily Giurleo (Inactive) [ 08/Nov/19 ]

Thanks for pointing that out! I'll follow up with Rathi about it.

Comment by Judah Schvimer [ 08/Nov/19 ]

I think we decided in SPEC-1296 that LockTimeout would not be a retryable error.

Comment by Emily Giurleo (Inactive) [ 08/Nov/19 ]

rathi.gnanasekaran jmikola judah.schvimer ldeng

Generated at Thu Feb 08 05:06:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.