Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-89711

proxy_protocol.js should handle errors when running proxyprotocol.server

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.1.0-rc0, 8.0.0-rc5, 7.3.4
    • Affects Version/s: None
    • Component/s: None
    • None
    • Service Arch
    • Fully Compatible
    • ALL
    • v8.0, v7.3
    • Service Arch 2024-04-29, Service Arch 2024-05-13
    • 0

      The first issue is that in proxy_protocol.js, checkProgram is asserted on without looking in the BSON field (namely the "alive" field). checkProgram always returns a BSON, so this assertion will always be true.

      In jstests/sharding/lib/proxy_protocol.js:

              this.pid = _startMongoProgram({args: args});
      
      -       assert(checkProgram(this.pid)); 
      +       assert(checkProgram(this.pid)["alive"]);

       

      Some context for a deeper issue: jstests/sharding/lib/proxy_protocol.js forks a process, and the child process calls execve on a server defined in the python lib proxyprotocol

      checkProgram calls waitpid and essentially checks for a program's existence by its PID. This function is being used to assert that the proxyprotocol server is up and running, but this is wrong.

      The checkProgram call could race with the child process (proxyprotocol server) program erroring. For example, let's say the parent process forks a process, then calls checkProgram on the child PID before the child process does anything. checkProgram would indicate that the child process is alive, and therefore we would think that our proxyprotocol server would be up. This can be wrong because the child process can then error and exit with an "address already in use" error when calling execve proxyprotocol server.

       

      This problem requires the parent process (our jstest) to wait for an indication that the child process (proxyprotocol server) is up and running. Because the proxyprotocol server is defined in the python lib, I think it's best if we use some IPC in the form of creating sockets to the listening and egress port to see if they are up. We can assert.soon that these sockets are successfully created before we proceed, also checking that the PID using these ports is what is expected. If the assert.soon fails, we can print a message with the programs that currently use the intended listening and egress port.

      A simpler but flaky fix is to just sleep before we call `checkProgram`. We assume here that the server creation attempt will happen before `checkProgram`.

      In jstests/sharding/lib/proxy_protocol.js:

              this.pid = _startMongoProgram({args: args});
      
      +       sleep(5000); // Assumes create_server called in 5 seconds.
      +
      -       assert(checkProgram(this.pid)); 
      +       assert(checkProgram(this.pid)["alive"]);

            Assignee:
            alex.li@mongodb.com Alex Li
            Reporter:
            alex.li@mongodb.com Alex Li
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: