[SERVER-50735] Mongos 4.4.0 can return the topologyVersion of a shard in state change errors Created: 02/Sep/20 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Shane Harvey | Assignee: | Backlog - Service Architecture |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | sa-remove-fv-backlog-22 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Service Arch
|
||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Repl 2020-10-05 | ||||||||
| Participants: | |||||||||
| Description |
|
Mongos can return the wrong topologyVersion in state change error responses to the client. Instead of returning its own topologyVersion, mongos can return the topologyVersion of a shard member in some error responses. I've attach a reproduction script using failCommand (reproMongosWrongTopologyVersion.js
The attach repro script's output (notice the different topologyVersion fields both reported by mongos):
It's possible this could be fixed by |
| Comments |
| Comment by Ratika Gandhi [ 09/Feb/21 ] |
|
We will revisit the ticket after |
| Comment by Tess Avitabile (Inactive) [ 22/Sep/20 ] |
|
I agree it's a problem for mongos to forward state change errors from shards to drivers. However, I don't think appending the mongos's topologyVersion is the right way to solve this problem. It's an important aspect of the streamable hello protocol that a node only attach its topologyVersion for state change errors that change its topologyVersion. If mongos attaches topologyVersion to shutdown errors in 4.4, for example, then drivers will incorrectly ignore those shutdown errors, even though the mongos is shutting down. I would rather solve |
| Comment by Divjot Arora (Inactive) [ 22/Sep/20 ] |
|
I have some concerns about mongos never returning topologyVersion for command errors. In drivers, an omitted topologyVersion means that the response is never considered stale, so it will be processed. Because the errors propagated from mongod's can be state change errors, this would cause drivers to mark the mongos Unknown due to a state change in one of the shards. This could potentially happen multiple times for the same state change if there are multiple concurrent operations running on the shard. This concern will probably be alleviated by |
| Comment by Tess Avitabile (Inactive) [ 22/Sep/20 ] |
|
Thank you! |
| Comment by Pavithra Vetriselvan [ 22/Sep/20 ] |
|
The changes should be relatively straightforward, so I can do it this sprint! |
| Comment by Tess Avitabile (Inactive) [ 22/Sep/20 ] |
|
Cool, sounds good! I think we should fix the issue this sprint if we can. Would it work for you to implement the fix this sprint? If it would be tough to fit in with your project plans this sprint, I can ask someone else. |
| Comment by Pavithra Vetriselvan [ 22/Sep/20 ] |
|
Ah, yes that makes sense to me. Good call, Tess! In that case, I think we would have to check here and here to see if the response contains a topologyVersion. If so, we remove it. This logic should probably be implemented in a helper function, perhaps called removeTopologyVersionFromResponse. |
| Comment by Tess Avitabile (Inactive) [ 22/Sep/20 ] |
|
Thanks, pavithra.vetriselvan and shane.harvey! shane.harvey, when the driver receives a topologyVersion as part of a command error, does it store that topologyVersion and use it in its next isMaster command to the mongos? If so, the next isMaster command to the mongos will return immediately, since the processId will be wrong. I don't think that's terribly harmful, but it's unexpected behavior. pavithra.vetriselvan, I think that in 4.4, mongos should never return its own topologyVersion as part of a command error. A node should only return its topologyVersion as part of a command error if that error is associated with a change in that node's topologyVersion. Otherwise, the driver will incorrectly ignore the command error. Post-4.4, the mongos should only return its own topologyVersion when in quiesce mode. So I think that the mongos should strip out the topologyVersion of the mongod from the response, but it should not append its own. Do you know the right place for us to strip out the topologyVersion of the mongod? |
| Comment by Pavithra Vetriselvan [ 21/Sep/20 ] |
|
Got it, thanks for the explanation Shane! |
| Comment by Shane Harvey [ 21/Sep/20 ] |
|
Thanks pavithra.vetriselvan. I believe that the only impact of returning the wrong topologyVersion is that clients which get multiple errors at the same time may mark the server unknown and clear the connection pool multiple times. In other words, it negates some of the benefit of DRIVERS-1187. |