Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20838

Exit code of MongoDB processes is unchecked in jstests starting their own clusters

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.1.9
    • Fix Version/s: 3.3.1
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      Build C (11/20/15), Build D (12/11/15), Build E (01/08/16), Build F (01/29/16)

      Description

      _stopMongoProgram() returns the exit code of the process, which gets propagated by MongoRunner.stopMongod() and MongoRunner.stopMongos(). However, none of the tests or testing infrastructure actually check the return value. It'd be nice to assert that the exit code is zero (in situations where the MongoDB processes aren't expected to crash), but doing so is hampered by how the process can be terminated with a SIGKILL if it takes longer than a minute to shut down.

      Tests that start their own MongoDB deployments currently do not fail when LeakSanitizer reports that there were memory leaks.

      int killDb(int port, ProcessId _pid, int signal, const BSONObj& opt) {
          ProcessId pid;
          int exitCode = 0;
          if (port > 0) {
              if (!registry.isPortRegistered(port)) {
                  log() << "No db started on port: " << port << endl;
                  return 0;
              }
              pid = registry.pidForPort(port);
          } else {
              pid = _pid;
          }
       
          kill_wrapper(pid, signal, port, opt);
       
          int i = 0;
          for (; i < 130; ++i) {
              if (i == 60) {
                  log() << "process on port " << port << ", with pid " << pid
                        << " not terminated, sending sigkill" << endl;
                  kill_wrapper(pid, SIGKILL, port, opt);
              }
              if (wait_for_pid(pid, false, &exitCode))
                  break;
              sleepmillis(1000);
          }
          if (i == 130) {
              log() << "failed to terminate process on port " << port << ", with pid " << pid << endl;
              verify("Failed to terminate process" == 0);
          }
       
          registry.deleteProgram(pid);
          // FIXME I think the intention here is to do an extra sleep only when SIGKILL is sent to the
          // child process. We may want to change the 4 below to 29, since values of i greater than that
          // indicate we sent a SIGKILL.
          if (i > 4 || signal == SIGKILL) {
              sleepmillis(4000);  // allow operating system to reclaim resources
          }
       
          return exitCode;
      }
      

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: