[SERVER-58139] Avoid leaking state from NetworkInterfaceTest::CancelLocally Created: 28/Jun/21  Updated: 29/Oct/23  Resolved: 06/Jul/21

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 5.1 Required
Fix Version/s: 4.4.10, 5.0.4, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Amirsaman Memaripour
Resolution: Fixed Votes: 0
Labels: servicearch-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-47602 _killOperations for hedged reads can ... Closed
related to SERVER-58209 Make NetworkInterfaceIntegrationFixtu... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0, v4.4
Sprint: Service Arch 2021-07-12
Participants:
Linked BF Score: 20
Story Points: 2

 Description   

The test uses cancelCommand to cancel a command locally. To ensure the command is not completed and blocked before calling into cancelCommand, this test relies on networkInterfaceHangCommandsAfterAcquireConn, a failpoint that blocks execution after a connection to the target host is acquired (and before the command is executed). However, there is no synchronization after this failpoint, allowing the command to continue running after cancelCommand. This translates to running the command even after the test is successfully completed.

The following shows logs for a successful execution of the test, where the execution of the command is finished (and not leaked) before returning from CancelLocally and before execution of CancelRemotely:

[cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.693+00:00 I  TEST     23059   [main] "Running","attr":{"test":"CancelLocally","rep":1,"reps":1}
[cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.693+00:00 W  CONTROL  23830   [main] "Set failpoint","attr":{"failPointName":"networkInterfaceHangCommandsAfterAcquireConn","failPoint":{"mode":1,"data":{},"timesEntered":0}}
[j1:prim] | 2021-06-10T19:37:37.694+00:00 I  NETWORK  22943   [listener] "Connection accepted","attr":{"remote":"127.0.0.1:50474","uuid":"8c1d0e6e-6805-4e59-80d3-1132910fdd12","connectionId":50,"connectionCount":6}
[j1:prim] | 2021-06-10T19:37:37.694+00:00 D2 COMMAND  21965   [conn50] "About to run the command","attr":{"db":"admin","commandArgs":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-62-85.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"}}
[j1:prim] | 2021-06-10T19:37:37.694+00:00 I  NETWORK  51800   [conn50] "client metadata","attr":{"remote":"127.0.0.1:50474","client":"conn50","doc":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}}}
[j1:prim] | 2021-06-10T19:37:37.694+00:00 I  COMMAND  51803   [conn50] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-62-85.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"},"numYields":0,"reslen":784,"locks":{},"remote":"127.0.0.1:50474","protocol":"op_query","durationMillis":0}
[cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.695+00:00 W  CONTROL  23831   [main] "Set failpoint","attr":{"failPointName":"networkInterfaceHangCommandsAfterAcquireConn","failPoint":{"mode":0,"data":{},"timesEntered":1}}
[j1:prim] | 2021-06-10T19:37:37.695+00:00 D2 COMMAND  21965   [conn48] "About to run the command","attr":{"db":"admin","commandArgs":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-62-85.ec2.internal:27017","internalClient":{"minWireVersion":0,"maxWireVersion":13},"hangUpOnStepDown":false,"$db":"admin"}}
[j1:prim] | 2021-06-10T19:37:37.695+00:00 I  NETWORK  51800   [conn48] "client metadata","attr":{"remote":"127.0.0.1:50468","client":"conn48","doc":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}}}
[j1:prim] | 2021-06-10T19:37:37.695+00:00 I  COMMAND  51803   [conn48] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-62-85.ec2.internal:27017","internalClient":{"minWireVersion":0,"maxWireVersion":13},"hangUpOnStepDown":false,"$db":"admin"},"numYields":0,"reslen":784,"locks":{},"remote":"127.0.0.1:50468","protocol":"op_query","durationMillis":0}
[j1:prim] | 2021-06-10T19:37:37.696+00:00 W  EXECUTOR 4910400 [conn48] "Terminating session due to error","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}
[j1:prim] | 2021-06-10T19:37:37.696+00:00 I  NETWORK  5127900 [conn48] "Ending session","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}
[j1:prim] | 2021-06-10T19:37:37.696+00:00 I  NETWORK  22944   [conn48] "Connection ended","attr":{"remote":"127.0.0.1:50468","uuid":"4fc98b31-a491-4918-968b-dfe80335e648","connectionId":48,"connectionCount":5}
[j1:prim] | 2021-06-10T19:37:37.795+00:00 D2 COMMAND  21965   [conn50] "About to run the command","attr":{"db":"admin","commandArgs":{"echo":1,"foo":"bar","$db":"admin"}}
[j1:prim] | 2021-06-10T19:37:37.795+00:00 D2 COMMAND  21955   [conn50] "Applying default readConcern on command","attr":{"readConcernDefault":{"readConcern":{"level":"local"}},"command":"echo"}
[j1:prim] | 2021-06-10T19:37:37.795+00:00 I  COMMAND  51803   [conn50] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"echo":1,"foo":"bar","$db":"admin"},"numYields":0,"reslen":212,"locks":{},"readConcern":{"level":"local","provenance":"implicitDefault"},"remote":"127.0.0.1:50474","protocol":"op_msg","durationMillis":0}
[cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.795+00:00 I  ASIO     4820501 [NetworkInterfaceIntegrationFixture] "Command failed","attr":{"error":{"code":90,"codeName":"CallbackCanceled","errmsg":"Command canceled; original request was: RemoteCommand 7 -- target:[localhost:20250] db:admin cmd:{ echo: 1, foo: \"bar\" }"}}
[cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.796+00:00 I  ASIO     22582   [NetworkInterfaceIntegrationFixture] "Killing all outstanding egress activity."
[cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.796+00:00 I  ASIO     22582   [main] "Killing all outstanding egress activity."
[j1:prim] | 2021-06-10T19:37:37.797+00:00 W  EXECUTOR 4910400 [conn50] "Terminating session due to error","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}
[j1:prim] | 2021-06-10T19:37:37.797+00:00 I  NETWORK  5127900 [conn50] "Ending session","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}
[j1:prim] | 2021-06-10T19:37:37.797+00:00 I  NETWORK  22944   [conn50] "Connection ended","attr":{"remote":"127.0.0.1:50474","uuid":"8c1d0e6e-6805-4e59-80d3-1132910fdd12","connectionId":50,"connectionCount":4}
[cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.796+00:00 I  TEST     23059   [main] "Running","attr":{"test":"CancelRemotely","rep":1,"reps":1}

However, the execution of the command may be postponed to some time after the next test begins, causing failures due to the leaked state:

[cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.259+00:00 I  TEST     23059   [main] "Running","attr":{"test":"CancelLocally","rep":1,"reps":1}
[cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.259+00:00 W  CONTROL  23830   [main] "Set failpoint","attr":{"failPointName":"networkInterfaceHangCommandsAfterAcquireConn","failPoint":{"mode":1,"data":{},"timesEntered":0}}
[j3:prim] | 2021-06-20T16:13:04.259+00:00 I  NETWORK  22943   [listener] "Connection accepted","attr":{"remote":"127.0.0.1:36842","uuid":"ff6ea7b9-0607-4813-a3bf-537753d6355c","connectionId":69,"connectionCount":5}
[j3:prim] | 2021-06-20T16:13:04.260+00:00 D2 COMMAND  21965   [conn69] "About to run the command","attr":{"db":"admin","commandArgs":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-67-44.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"}}
[j3:prim] | 2021-06-20T16:13:04.260+00:00 I  NETWORK  51800   [conn69] "client metadata","attr":{"remote":"127.0.0.1:36842","client":"conn69","doc":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}}}
[j3:prim] | 2021-06-20T16:13:04.260+00:00 I  COMMAND  51803   [conn69] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-67-44.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"},"numYields":0,"reslen":784,"locks":{},"remote":"127.0.0.1:36842","protocol":"op_query","durationMillis":0}
[cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.360+00:00 W  CONTROL  23831   [main] "Set failpoint","attr":{"failPointName":"networkInterfaceHangCommandsAfterAcquireConn","failPoint":{"mode":0,"data":{},"timesEntered":1}}
[cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.464+00:00 I  ASIO     4820501 [NetworkInterfaceIntegrationFixture] "Command failed","attr":{"error":{"code":90,"codeName":"CallbackCanceled","errmsg":"Command canceled; original request was: RemoteCommand 7 -- target:[localhost:20750] db:admin cmd:{ echo: 1, foo: \"bar\" }"}}
[cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.464+00:00 I  ASIO     22582   [NetworkInterfaceIntegrationFixture] "Killing all outstanding egress activity."
[cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.464+00:00 I  ASIO     22582   [main] "Killing all outstanding egress activity."
[j3:prim] | 2021-06-20T16:13:04.465+00:00 I  NETWORK  22943   [listener] "Connection accepted","attr":{"remote":"127.0.0.1:36854","uuid":"55ee9dc0-be09-4c89-bf73-69fdc2fa14d0","connectionId":70,"connectionCount":6}
[cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.464+00:00 I  TEST     23059   [main] "Running","attr":{"test":"CancelRemotely","rep":1,"reps":1}
[j3:prim] | 2021-06-20T16:13:04.466+00:00 D2 COMMAND  21965   [conn70] "About to run the command","attr":{"db":"admin","commandArgs":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-67-44.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"}}
[j3:prim] | 2021-06-20T16:13:04.466+00:00 I  NETWORK  51800   [conn70] "client metadata","attr":{"remote":"127.0.0.1:36854","client":"conn70","doc":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}}}
[j3:prim] | 2021-06-20T16:13:04.466+00:00 I  COMMAND  51803   [conn70] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-67-44.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"},"numYields":0,"reslen":784,"locks":{},"remote":"127.0.0.1:36854","protocol":"op_query","durationMillis":0}
[cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.467+00:00 I  ASIO     4820500 [NetworkInterfaceIntegrationFixture] "Got command result","attr":{"response":"RemoteResponse -- cmd: { count: 0, ok: 1.0, $clusterTime: { clusterTime: Timestamp(1624205573, 10), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, operationTime: Timestamp(1624205573, 10) } status: OK elapsed: 524 μs moreToCome: false"}
[j3:prim] | 2021-06-20T16:13:04.467+00:00 D2 COMMAND  21965   [conn70] "About to run the command","attr":{"db":"admin","commandArgs":{"configureFailPoint":"failCommand","mode":"alwaysOn","data":{"blockConnection":true,"blockTimeMS":1000000000,"failCommands":["echo"]},"$db":"admin"}}
[j3:prim] | 2021-06-20T16:13:04.467+00:00 D2 COMMAND  21961   [conn70] "Admin only command","attr":{"command":"configureFailPoint"}
[j3:prim] | 2021-06-20T16:13:04.467+00:00 D2 COMMAND  21955   [conn70] "Applying default readConcern on command","attr":{"readConcernDefault":{"readConcern":{"level":"local"}},"command":"configureFailPoint"}
[j3:prim] | 2021-06-20T16:13:04.467+00:00 W  CONTROL  23829   [conn70] "Set failpoint","attr":{"failPointName":"failCommand","failPoint":{"mode":1,"data":{"blockConnection":true,"blockTimeMS":1000000000,"failCommands":["echo"]},"timesEntered":0}}
[j3:prim] | 2021-06-20T16:13:04.467+00:00 I  COMMAND  51803   [conn70] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"configureFailPoint":"failCommand","mode":"alwaysOn","data":{"blockConnection":true,"blockTimeMS":1000000000,"failCommands":["echo"]},"$db":"admin"},"numYields":0,"reslen":174,"locks":{},"readConcern":{"level":"local","provenance":"implicitDefault"},"remote":"127.0.0.1:36854","protocol":"op_msg","durationMillis":0}
[j3:prim] | 2021-06-20T16:13:04.467+00:00 D2 COMMAND  21965   [conn69] "About to run the command","attr":{"db":"admin","commandArgs":{"echo":1,"foo":"bar","$db":"admin"}}
[j3:prim] | 2021-06-20T16:13:04.467+00:00 I  COMMAND  4898500 [conn69] "Activating 'failCommand' failpoint","attr":{"data":{"blockConnection":true,"blockTimeMS":1000000000,"failCommands":["echo"]},"threadName":"conn69","appName":"","namespace":"admin","isInternalClient":false,"command":"echo"}
[j3:prim] | 2021-06-20T16:13:04.467+00:00 I  COMMAND  20432   [conn69] "Blocking command via 'failCommand' failpoint","attr":{"command":"echo","blockTimeMillis":1000000000}

Acceptance criteria: ensure any command started by CancelLocally completes before returning from the test.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 20/Sep/21 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-58139 Check for cancelation before sending remote commands

(cherry picked from commit d0d4084cffcf71ea6dc56705a413f16051452447)
Branch: v5.0
https://github.com/mongodb/mongo/commit/dc48936ed11ed690b380e01800c2bf13d3eda747

Comment by Githook User [ 17/Sep/21 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-58139 Check for cancelation before sending remote commands

(cherry picked from commit d0d4084cffcf71ea6dc56705a413f16051452447)
Branch: v4.4
https://github.com/mongodb/mongo/commit/dfa66888df295b9c241ee300024121471a175818

Comment by Githook User [ 06/Jul/21 ]

Author:

{'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}

Message: SERVER-58139 Check for cancelation before sending remote commands
Branch: master
https://github.com/mongodb/mongo/commit/d0d4084cffcf71ea6dc56705a413f16051452447

Generated at Thu Feb 08 05:43:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.