-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 5.1 Required
-
Component/s: Internal Code
-
Fully Compatible
-
ALL
-
v5.0, v4.4
-
Service Arch 2021-07-12
-
20
-
2
The test uses cancelCommand to cancel a command locally. To ensure the command is not completed and blocked before calling into cancelCommand, this test relies on networkInterfaceHangCommandsAfterAcquireConn, a failpoint that blocks execution after a connection to the target host is acquired (and before the command is executed). However, there is no synchronization after this failpoint, allowing the command to continue running after cancelCommand. This translates to running the command even after the test is successfully completed.
The following shows logs for a successful execution of the test, where the execution of the command is finished (and not leaked) before returning from CancelLocally and before execution of CancelRemotely:
[cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.693+00:00 I TEST 23059 [main] "Running","attr":{"test":"CancelLocally","rep":1,"reps":1} [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.693+00:00 W CONTROL 23830 [main] "Set failpoint","attr":{"failPointName":"networkInterfaceHangCommandsAfterAcquireConn","failPoint":{"mode":1,"data":{},"timesEntered":0}} [j1:prim] | 2021-06-10T19:37:37.694+00:00 I NETWORK 22943 [listener] "Connection accepted","attr":{"remote":"127.0.0.1:50474","uuid":"8c1d0e6e-6805-4e59-80d3-1132910fdd12","connectionId":50,"connectionCount":6} [j1:prim] | 2021-06-10T19:37:37.694+00:00 D2 COMMAND 21965 [conn50] "About to run the command","attr":{"db":"admin","commandArgs":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-62-85.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"}} [j1:prim] | 2021-06-10T19:37:37.694+00:00 I NETWORK 51800 [conn50] "client metadata","attr":{"remote":"127.0.0.1:50474","client":"conn50","doc":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}}} [j1:prim] | 2021-06-10T19:37:37.694+00:00 I COMMAND 51803 [conn50] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-62-85.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"},"numYields":0,"reslen":784,"locks":{},"remote":"127.0.0.1:50474","protocol":"op_query","durationMillis":0} [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.695+00:00 W CONTROL 23831 [main] "Set failpoint","attr":{"failPointName":"networkInterfaceHangCommandsAfterAcquireConn","failPoint":{"mode":0,"data":{},"timesEntered":1}} [j1:prim] | 2021-06-10T19:37:37.695+00:00 D2 COMMAND 21965 [conn48] "About to run the command","attr":{"db":"admin","commandArgs":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-62-85.ec2.internal:27017","internalClient":{"minWireVersion":0,"maxWireVersion":13},"hangUpOnStepDown":false,"$db":"admin"}} [j1:prim] | 2021-06-10T19:37:37.695+00:00 I NETWORK 51800 [conn48] "client metadata","attr":{"remote":"127.0.0.1:50468","client":"conn48","doc":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}}} [j1:prim] | 2021-06-10T19:37:37.695+00:00 I COMMAND 51803 [conn48] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-62-85.ec2.internal:27017","internalClient":{"minWireVersion":0,"maxWireVersion":13},"hangUpOnStepDown":false,"$db":"admin"},"numYields":0,"reslen":784,"locks":{},"remote":"127.0.0.1:50468","protocol":"op_query","durationMillis":0} [j1:prim] | 2021-06-10T19:37:37.696+00:00 W EXECUTOR 4910400 [conn48] "Terminating session due to error","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}} [j1:prim] | 2021-06-10T19:37:37.696+00:00 I NETWORK 5127900 [conn48] "Ending session","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}} [j1:prim] | 2021-06-10T19:37:37.696+00:00 I NETWORK 22944 [conn48] "Connection ended","attr":{"remote":"127.0.0.1:50468","uuid":"4fc98b31-a491-4918-968b-dfe80335e648","connectionId":48,"connectionCount":5} [j1:prim] | 2021-06-10T19:37:37.795+00:00 D2 COMMAND 21965 [conn50] "About to run the command","attr":{"db":"admin","commandArgs":{"echo":1,"foo":"bar","$db":"admin"}} [j1:prim] | 2021-06-10T19:37:37.795+00:00 D2 COMMAND 21955 [conn50] "Applying default readConcern on command","attr":{"readConcernDefault":{"readConcern":{"level":"local"}},"command":"echo"} [j1:prim] | 2021-06-10T19:37:37.795+00:00 I COMMAND 51803 [conn50] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"echo":1,"foo":"bar","$db":"admin"},"numYields":0,"reslen":212,"locks":{},"readConcern":{"level":"local","provenance":"implicitDefault"},"remote":"127.0.0.1:50474","protocol":"op_msg","durationMillis":0} [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.795+00:00 I ASIO 4820501 [NetworkInterfaceIntegrationFixture] "Command failed","attr":{"error":{"code":90,"codeName":"CallbackCanceled","errmsg":"Command canceled; original request was: RemoteCommand 7 -- target:[localhost:20250] db:admin cmd:{ echo: 1, foo: \"bar\" }"}} [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.796+00:00 I ASIO 22582 [NetworkInterfaceIntegrationFixture] "Killing all outstanding egress activity." [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.796+00:00 I ASIO 22582 [main] "Killing all outstanding egress activity." [j1:prim] | 2021-06-10T19:37:37.797+00:00 W EXECUTOR 4910400 [conn50] "Terminating session due to error","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}} [j1:prim] | 2021-06-10T19:37:37.797+00:00 I NETWORK 5127900 [conn50] "Ending session","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}} [j1:prim] | 2021-06-10T19:37:37.797+00:00 I NETWORK 22944 [conn50] "Connection ended","attr":{"remote":"127.0.0.1:50474","uuid":"8c1d0e6e-6805-4e59-80d3-1132910fdd12","connectionId":50,"connectionCount":4} [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.796+00:00 I TEST 23059 [main] "Running","attr":{"test":"CancelRemotely","rep":1,"reps":1}
However, the execution of the command may be postponed to some time after the next test begins, causing failures due to the leaked state:
[cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.259+00:00 I TEST 23059 [main] "Running","attr":{"test":"CancelLocally","rep":1,"reps":1} [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.259+00:00 W CONTROL 23830 [main] "Set failpoint","attr":{"failPointName":"networkInterfaceHangCommandsAfterAcquireConn","failPoint":{"mode":1,"data":{},"timesEntered":0}} [j3:prim] | 2021-06-20T16:13:04.259+00:00 I NETWORK 22943 [listener] "Connection accepted","attr":{"remote":"127.0.0.1:36842","uuid":"ff6ea7b9-0607-4813-a3bf-537753d6355c","connectionId":69,"connectionCount":5} [j3:prim] | 2021-06-20T16:13:04.260+00:00 D2 COMMAND 21965 [conn69] "About to run the command","attr":{"db":"admin","commandArgs":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-67-44.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"}} [j3:prim] | 2021-06-20T16:13:04.260+00:00 I NETWORK 51800 [conn69] "client metadata","attr":{"remote":"127.0.0.1:36842","client":"conn69","doc":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}}} [j3:prim] | 2021-06-20T16:13:04.260+00:00 I COMMAND 51803 [conn69] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-67-44.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"},"numYields":0,"reslen":784,"locks":{},"remote":"127.0.0.1:36842","protocol":"op_query","durationMillis":0} [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.360+00:00 W CONTROL 23831 [main] "Set failpoint","attr":{"failPointName":"networkInterfaceHangCommandsAfterAcquireConn","failPoint":{"mode":0,"data":{},"timesEntered":1}} [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.464+00:00 I ASIO 4820501 [NetworkInterfaceIntegrationFixture] "Command failed","attr":{"error":{"code":90,"codeName":"CallbackCanceled","errmsg":"Command canceled; original request was: RemoteCommand 7 -- target:[localhost:20750] db:admin cmd:{ echo: 1, foo: \"bar\" }"}} [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.464+00:00 I ASIO 22582 [NetworkInterfaceIntegrationFixture] "Killing all outstanding egress activity." [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.464+00:00 I ASIO 22582 [main] "Killing all outstanding egress activity." [j3:prim] | 2021-06-20T16:13:04.465+00:00 I NETWORK 22943 [listener] "Connection accepted","attr":{"remote":"127.0.0.1:36854","uuid":"55ee9dc0-be09-4c89-bf73-69fdc2fa14d0","connectionId":70,"connectionCount":6} [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.464+00:00 I TEST 23059 [main] "Running","attr":{"test":"CancelRemotely","rep":1,"reps":1} [j3:prim] | 2021-06-20T16:13:04.466+00:00 D2 COMMAND 21965 [conn70] "About to run the command","attr":{"db":"admin","commandArgs":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-67-44.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"}} [j3:prim] | 2021-06-20T16:13:04.466+00:00 I NETWORK 51800 [conn70] "client metadata","attr":{"remote":"127.0.0.1:36854","client":"conn70","doc":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}}} [j3:prim] | 2021-06-20T16:13:04.466+00:00 I COMMAND 51803 [conn70] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-67-44.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"},"numYields":0,"reslen":784,"locks":{},"remote":"127.0.0.1:36854","protocol":"op_query","durationMillis":0} [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.467+00:00 I ASIO 4820500 [NetworkInterfaceIntegrationFixture] "Got command result","attr":{"response":"RemoteResponse -- cmd: { count: 0, ok: 1.0, $clusterTime: { clusterTime: Timestamp(1624205573, 10), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, operationTime: Timestamp(1624205573, 10) } status: OK elapsed: 524 μs moreToCome: false"} [j3:prim] | 2021-06-20T16:13:04.467+00:00 D2 COMMAND 21965 [conn70] "About to run the command","attr":{"db":"admin","commandArgs":{"configureFailPoint":"failCommand","mode":"alwaysOn","data":{"blockConnection":true,"blockTimeMS":1000000000,"failCommands":["echo"]},"$db":"admin"}} [j3:prim] | 2021-06-20T16:13:04.467+00:00 D2 COMMAND 21961 [conn70] "Admin only command","attr":{"command":"configureFailPoint"} [j3:prim] | 2021-06-20T16:13:04.467+00:00 D2 COMMAND 21955 [conn70] "Applying default readConcern on command","attr":{"readConcernDefault":{"readConcern":{"level":"local"}},"command":"configureFailPoint"} [j3:prim] | 2021-06-20T16:13:04.467+00:00 W CONTROL 23829 [conn70] "Set failpoint","attr":{"failPointName":"failCommand","failPoint":{"mode":1,"data":{"blockConnection":true,"blockTimeMS":1000000000,"failCommands":["echo"]},"timesEntered":0}} [j3:prim] | 2021-06-20T16:13:04.467+00:00 I COMMAND 51803 [conn70] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"configureFailPoint":"failCommand","mode":"alwaysOn","data":{"blockConnection":true,"blockTimeMS":1000000000,"failCommands":["echo"]},"$db":"admin"},"numYields":0,"reslen":174,"locks":{},"readConcern":{"level":"local","provenance":"implicitDefault"},"remote":"127.0.0.1:36854","protocol":"op_msg","durationMillis":0} [j3:prim] | 2021-06-20T16:13:04.467+00:00 D2 COMMAND 21965 [conn69] "About to run the command","attr":{"db":"admin","commandArgs":{"echo":1,"foo":"bar","$db":"admin"}} [j3:prim] | 2021-06-20T16:13:04.467+00:00 I COMMAND 4898500 [conn69] "Activating 'failCommand' failpoint","attr":{"data":{"blockConnection":true,"blockTimeMS":1000000000,"failCommands":["echo"]},"threadName":"conn69","appName":"","namespace":"admin","isInternalClient":false,"command":"echo"} [j3:prim] | 2021-06-20T16:13:04.467+00:00 I COMMAND 20432 [conn69] "Blocking command via 'failCommand' failpoint","attr":{"command":"echo","blockTimeMillis":1000000000}
Acceptance criteria: ensure any command started by CancelLocally completes before returning from the test.
- related to
-
SERVER-47602 _killOperations for hedged reads can reach remote before original operation
- Closed
-
SERVER-58209 Make NetworkInterfaceIntegrationFixture wait for in progress requests
- Closed