Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58139

Avoid leaking state from NetworkInterfaceTest::CancelLocally

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v5.0, v4.4
    • Sprint:
      Service Arch 2021-07-12
    • Linked BF Score:
      20
    • Story Points:
      2

      Description

      The test uses cancelCommand to cancel a command locally. To ensure the command is not completed and blocked before calling into cancelCommand, this test relies on networkInterfaceHangCommandsAfterAcquireConn, a failpoint that blocks execution after a connection to the target host is acquired (and before the command is executed). However, there is no synchronization after this failpoint, allowing the command to continue running after cancelCommand. This translates to running the command even after the test is successfully completed.

      The following shows logs for a successful execution of the test, where the execution of the command is finished (and not leaked) before returning from CancelLocally and before execution of CancelRemotely:

      [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.693+00:00 I  TEST     23059   [main] "Running","attr":{"test":"CancelLocally","rep":1,"reps":1}
      [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.693+00:00 W  CONTROL  23830   [main] "Set failpoint","attr":{"failPointName":"networkInterfaceHangCommandsAfterAcquireConn","failPoint":{"mode":1,"data":{},"timesEntered":0}}
      [j1:prim] | 2021-06-10T19:37:37.694+00:00 I  NETWORK  22943   [listener] "Connection accepted","attr":{"remote":"127.0.0.1:50474","uuid":"8c1d0e6e-6805-4e59-80d3-1132910fdd12","connectionId":50,"connectionCount":6}
      [j1:prim] | 2021-06-10T19:37:37.694+00:00 D2 COMMAND  21965   [conn50] "About to run the command","attr":{"db":"admin","commandArgs":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-62-85.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"}}
      [j1:prim] | 2021-06-10T19:37:37.694+00:00 I  NETWORK  51800   [conn50] "client metadata","attr":{"remote":"127.0.0.1:50474","client":"conn50","doc":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}}}
      [j1:prim] | 2021-06-10T19:37:37.694+00:00 I  COMMAND  51803   [conn50] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-62-85.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"},"numYields":0,"reslen":784,"locks":{},"remote":"127.0.0.1:50474","protocol":"op_query","durationMillis":0}
      [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.695+00:00 W  CONTROL  23831   [main] "Set failpoint","attr":{"failPointName":"networkInterfaceHangCommandsAfterAcquireConn","failPoint":{"mode":0,"data":{},"timesEntered":1}}
      [j1:prim] | 2021-06-10T19:37:37.695+00:00 D2 COMMAND  21965   [conn48] "About to run the command","attr":{"db":"admin","commandArgs":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-62-85.ec2.internal:27017","internalClient":{"minWireVersion":0,"maxWireVersion":13},"hangUpOnStepDown":false,"$db":"admin"}}
      [j1:prim] | 2021-06-10T19:37:37.695+00:00 I  NETWORK  51800   [conn48] "client metadata","attr":{"remote":"127.0.0.1:50468","client":"conn48","doc":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}}}
      [j1:prim] | 2021-06-10T19:37:37.695+00:00 I  COMMAND  51803   [conn48] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-rc1-16-g06168e4"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-62-85.ec2.internal:27017","internalClient":{"minWireVersion":0,"maxWireVersion":13},"hangUpOnStepDown":false,"$db":"admin"},"numYields":0,"reslen":784,"locks":{},"remote":"127.0.0.1:50468","protocol":"op_query","durationMillis":0}
      [j1:prim] | 2021-06-10T19:37:37.696+00:00 W  EXECUTOR 4910400 [conn48] "Terminating session due to error","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}
      [j1:prim] | 2021-06-10T19:37:37.696+00:00 I  NETWORK  5127900 [conn48] "Ending session","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}
      [j1:prim] | 2021-06-10T19:37:37.696+00:00 I  NETWORK  22944   [conn48] "Connection ended","attr":{"remote":"127.0.0.1:50468","uuid":"4fc98b31-a491-4918-968b-dfe80335e648","connectionId":48,"connectionCount":5}
      [j1:prim] | 2021-06-10T19:37:37.795+00:00 D2 COMMAND  21965   [conn50] "About to run the command","attr":{"db":"admin","commandArgs":{"echo":1,"foo":"bar","$db":"admin"}}
      [j1:prim] | 2021-06-10T19:37:37.795+00:00 D2 COMMAND  21955   [conn50] "Applying default readConcern on command","attr":{"readConcernDefault":{"readConcern":{"level":"local"}},"command":"echo"}
      [j1:prim] | 2021-06-10T19:37:37.795+00:00 I  COMMAND  51803   [conn50] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"echo":1,"foo":"bar","$db":"admin"},"numYields":0,"reslen":212,"locks":{},"readConcern":{"level":"local","provenance":"implicitDefault"},"remote":"127.0.0.1:50474","protocol":"op_msg","durationMillis":0}
      [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.795+00:00 I  ASIO     4820501 [NetworkInterfaceIntegrationFixture] "Command failed","attr":{"error":{"code":90,"codeName":"CallbackCanceled","errmsg":"Command canceled; original request was: RemoteCommand 7 -- target:[localhost:20250] db:admin cmd:{ echo: 1, foo: \"bar\" }"}}
      [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.796+00:00 I  ASIO     22582   [NetworkInterfaceIntegrationFixture] "Killing all outstanding egress activity."
      [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.796+00:00 I  ASIO     22582   [main] "Killing all outstanding egress activity."
      [j1:prim] | 2021-06-10T19:37:37.797+00:00 W  EXECUTOR 4910400 [conn50] "Terminating session due to error","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}
      [j1:prim] | 2021-06-10T19:37:37.797+00:00 I  NETWORK  5127900 [conn50] "Ending session","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}
      [j1:prim] | 2021-06-10T19:37:37.797+00:00 I  NETWORK  22944   [conn50] "Connection ended","attr":{"remote":"127.0.0.1:50474","uuid":"8c1d0e6e-6805-4e59-80d3-1132910fdd12","connectionId":50,"connectionCount":4}
      [cpp_integration_test:executor_integration_test] | 2021-06-10T19:37:37.796+00:00 I  TEST     23059   [main] "Running","attr":{"test":"CancelRemotely","rep":1,"reps":1}
      

      However, the execution of the command may be postponed to some time after the next test begins, causing failures due to the leaked state:

      [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.259+00:00 I  TEST     23059   [main] "Running","attr":{"test":"CancelLocally","rep":1,"reps":1}
      [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.259+00:00 W  CONTROL  23830   [main] "Set failpoint","attr":{"failPointName":"networkInterfaceHangCommandsAfterAcquireConn","failPoint":{"mode":1,"data":{},"timesEntered":0}}
      [j3:prim] | 2021-06-20T16:13:04.259+00:00 I  NETWORK  22943   [listener] "Connection accepted","attr":{"remote":"127.0.0.1:36842","uuid":"ff6ea7b9-0607-4813-a3bf-537753d6355c","connectionId":69,"connectionCount":5}
      [j3:prim] | 2021-06-20T16:13:04.260+00:00 D2 COMMAND  21965   [conn69] "About to run the command","attr":{"db":"admin","commandArgs":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-67-44.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"}}
      [j3:prim] | 2021-06-20T16:13:04.260+00:00 I  NETWORK  51800   [conn69] "client metadata","attr":{"remote":"127.0.0.1:36842","client":"conn69","doc":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}}}
      [j3:prim] | 2021-06-20T16:13:04.260+00:00 I  COMMAND  51803   [conn69] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-67-44.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"},"numYields":0,"reslen":784,"locks":{},"remote":"127.0.0.1:36842","protocol":"op_query","durationMillis":0}
      [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.360+00:00 W  CONTROL  23831   [main] "Set failpoint","attr":{"failPointName":"networkInterfaceHangCommandsAfterAcquireConn","failPoint":{"mode":0,"data":{},"timesEntered":1}}
      [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.464+00:00 I  ASIO     4820501 [NetworkInterfaceIntegrationFixture] "Command failed","attr":{"error":{"code":90,"codeName":"CallbackCanceled","errmsg":"Command canceled; original request was: RemoteCommand 7 -- target:[localhost:20750] db:admin cmd:{ echo: 1, foo: \"bar\" }"}}
      [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.464+00:00 I  ASIO     22582   [NetworkInterfaceIntegrationFixture] "Killing all outstanding egress activity."
      [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.464+00:00 I  ASIO     22582   [main] "Killing all outstanding egress activity."
      [j3:prim] | 2021-06-20T16:13:04.465+00:00 I  NETWORK  22943   [listener] "Connection accepted","attr":{"remote":"127.0.0.1:36854","uuid":"55ee9dc0-be09-4c89-bf73-69fdc2fa14d0","connectionId":70,"connectionCount":6}
      [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.464+00:00 I  TEST     23059   [main] "Running","attr":{"test":"CancelRemotely","rep":1,"reps":1}
      [j3:prim] | 2021-06-20T16:13:04.466+00:00 D2 COMMAND  21965   [conn70] "About to run the command","attr":{"db":"admin","commandArgs":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-67-44.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"}}
      [j3:prim] | 2021-06-20T16:13:04.466+00:00 I  NETWORK  51800   [conn70] "client metadata","attr":{"remote":"127.0.0.1:36854","client":"conn70","doc":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}}}
      [j3:prim] | 2021-06-20T16:13:04.466+00:00 I  COMMAND  51803   [conn70] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"isMaster":1,"client":{"driver":{"name":"NetworkInterfaceTL","version":"5.0.0-alpha0-1181-gc3e6aa7"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux release 8.0 (Ootpa)","architecture":"x86_64","version":"Kernel 4.18.0-80.1.2.el8_0.x86_64"}},"hostInfo":"ip-10-122-67-44.ec2.internal:27017","hangUpOnStepDown":false,"$db":"admin"},"numYields":0,"reslen":784,"locks":{},"remote":"127.0.0.1:36854","protocol":"op_query","durationMillis":0}
      [cpp_integration_test:executor_integration_test] | 2021-06-20T16:13:04.467+00:00 I  ASIO     4820500 [NetworkInterfaceIntegrationFixture] "Got command result","attr":{"response":"RemoteResponse -- cmd: { count: 0, ok: 1.0, $clusterTime: { clusterTime: Timestamp(1624205573, 10), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, operationTime: Timestamp(1624205573, 10) } status: OK elapsed: 524 μs moreToCome: false"}
      [j3:prim] | 2021-06-20T16:13:04.467+00:00 D2 COMMAND  21965   [conn70] "About to run the command","attr":{"db":"admin","commandArgs":{"configureFailPoint":"failCommand","mode":"alwaysOn","data":{"blockConnection":true,"blockTimeMS":1000000000,"failCommands":["echo"]},"$db":"admin"}}
      [j3:prim] | 2021-06-20T16:13:04.467+00:00 D2 COMMAND  21961   [conn70] "Admin only command","attr":{"command":"configureFailPoint"}
      [j3:prim] | 2021-06-20T16:13:04.467+00:00 D2 COMMAND  21955   [conn70] "Applying default readConcern on command","attr":{"readConcernDefault":{"readConcern":{"level":"local"}},"command":"configureFailPoint"}
      [j3:prim] | 2021-06-20T16:13:04.467+00:00 W  CONTROL  23829   [conn70] "Set failpoint","attr":{"failPointName":"failCommand","failPoint":{"mode":1,"data":{"blockConnection":true,"blockTimeMS":1000000000,"failCommands":["echo"]},"timesEntered":0}}
      [j3:prim] | 2021-06-20T16:13:04.467+00:00 I  COMMAND  51803   [conn70] "Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"configureFailPoint":"failCommand","mode":"alwaysOn","data":{"blockConnection":true,"blockTimeMS":1000000000,"failCommands":["echo"]},"$db":"admin"},"numYields":0,"reslen":174,"locks":{},"readConcern":{"level":"local","provenance":"implicitDefault"},"remote":"127.0.0.1:36854","protocol":"op_msg","durationMillis":0}
      [j3:prim] | 2021-06-20T16:13:04.467+00:00 D2 COMMAND  21965   [conn69] "About to run the command","attr":{"db":"admin","commandArgs":{"echo":1,"foo":"bar","$db":"admin"}}
      [j3:prim] | 2021-06-20T16:13:04.467+00:00 I  COMMAND  4898500 [conn69] "Activating 'failCommand' failpoint","attr":{"data":{"blockConnection":true,"blockTimeMS":1000000000,"failCommands":["echo"]},"threadName":"conn69","appName":"","namespace":"admin","isInternalClient":false,"command":"echo"}
      [j3:prim] | 2021-06-20T16:13:04.467+00:00 I  COMMAND  20432   [conn69] "Blocking command via 'failCommand' failpoint","attr":{"command":"echo","blockTimeMillis":1000000000}
      

      Acceptance criteria: ensure any command started by CancelLocally completes before returning from the test.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              amirsaman.memaripour Amirsaman Memaripour
              Reporter:
              amirsaman.memaripour Amirsaman Memaripour
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: