[SERVER-49220] failCommand with appName does not fail connection handshake command Created: 01/Jul/20  Updated: 20/Oct/20  Resolved: 20/Oct/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.0-rc12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Shane Harvey Assignee: Benjamin Caimano (Inactive)
Resolution: Duplicate Votes: 0
Labels: sa-groomed
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-49336 Set client metadata if it is missing ... Closed
Duplicate
duplicates SERVER-49336 Set client metadata if it is missing ... Closed
is duplicated by SERVER-48932 ReplicaSetMonitor hitting failpoint s... Closed
Problem/Incident
is caused by SERVER-48985 Add logging for failCommand Closed
Operating System: ALL
Participants:

 Description   

It look like after the changes in SERVER-48985 (4.4.0-rc11-9-gee10647) the behavior of failCommand has changed slightly. It's no longer possible to fail the initial connection handshake and specify an appName filter. For example, the following failpoint on 4.4.0-rc11 will trigger on the connection handshake, now it only triggers on subsequent isMaster commands:

{
    'configureFailPoint': 'failCommand',
    'mode': {'times': 2},
    'data': {
        'failCommands': ['isMaster'],
        'closeConnection': False,
        'errorCode': 91,
        'appName': 'failHandshakeTest',
    },
}

Is this the same issue described in SERVER-48985 and SERVER-49157?

CC: ben.caimano



 Comments   
Comment by Benjamin Caimano (Inactive) [ 20/Oct/20 ]

I believe we have addressed the issue via SERVER-49336.

Comment by Benjamin Caimano (Inactive) [ 02/Jul/20 ]

Potentially, yes. It depends on how intrusive the bug fix is. SERVER-48985 was only a failure in our testing code. My suspicion is that this behavior is about how we process metadata for OP_QUERY in general.

Comment by Shane Harvey [ 02/Jul/20 ]

Is this something that can be fixed for 4.4?

Comment by Shane Harvey [ 01/Jul/20 ]

divjot.arora has pointed out that the following spec tests no longer work as designed (because the server does not fail the initial handshake):

It would be great to get this bug fixed so that we can regain test coverage for failed handshake commands.

Comment by Shane Harvey [ 01/Jul/20 ]

I believe this is only affecting a few Python specific tests and not all drivers so I don't think this is a blocker. I can workaround this issue in pymongo's test suite for the time being by removing the appName filter. Still, it would be great to get a fix sooner rather than later.

Comment by Benjamin Caimano (Inactive) [ 01/Jul/20 ]

shane.harvey, that is interesting. My first guess is that we are not properly reading the appName from OP_QUERY requests. Before, we would have failed the command because the appName was considered empty. Now, it doesn't match the filter so it isn't failed.

How heavily is this affecting the drivers team?

Generated at Thu Feb 08 05:19:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.