Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-42823

Output between print() function and ProgramOutputMultiplexer::appendLine() can be interleaved, leading to a deadlock on Windows DEBUG

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Shell
    • Labels:
      None
    • ALL
    • Hide
      python buildscripts/resmoke.py --suite=no_server repro_interleaved_output.js | sed 's/x\+/xxx/g'
      
      repro_interleaved_output.js
      (function() {
          'use strict';
      
          // _INTERNAL_BUFSIZ=4096 is the default buffer size for streams on Windows.
          TestData.largeStr = 'x'.repeat(4096);
      
          const awaitShell = startParallelShell(function() {
              while (true) {
                  print(`inside parallel shell: ${TestData.largeStr}`);
              }
          }, undefined, true);
      
          while (true) {
              print(`main shell: ${TestData.largeStr}`);
          }
      })();
      
      Show
      python buildscripts/resmoke.py --suite=no_server repro_interleaved_output.js | sed 's/x\+/xxx/g' repro_interleaved_output.js ( function () { 'use strict' ; // _INTERNAL_BUFSIZ=4096 is the default buffer size for streams on Windows. TestData.largeStr = 'x' .repeat(4096); const awaitShell = startParallelShell( function () { while ( true ) { print( `inside parallel shell: ${TestData.largeStr}` ); } }, undefined, true ); while ( true ) { print( `main shell: ${TestData.largeStr}` ); } })();
    • Dev Tools 2019-08-26, Dev Tools 2019-09-09, Dev Tools 2020-03-23

      The print() function acquires the ShellConsoleAppender::mx mutex when writing to std::cout as part of the LogstreamBuilder going out of scope. The ProgramOutputMultiplexer::appendLine() function is holding the mongoProgramOutputMutex mutex when writing directly to std::cout. This allows the main thread in the mongo shell to be writing text to std::cout in the print() function while a ProgramRunner reader thread is writing subprocess output to std::cout.

      Interleaving the writes to std::cout from these two different threads corrupts the underlying output stream. It causes the following assertion to fire on --dbg=on builds on Windows.

      fwrite.cpp
      // Write blocks of data from the buffer until there is no more data left:
      size_t remaining_bytes = total_bytes;
      while (remaining_bytes != 0)
      {
          // If the buffer is big and is not full, copy data into the buffer:
          if (stream.has_big_buffer() && stream->_cnt != 0)
          {
              if (stream->_cnt < 0)
              {
                  _ASSERTE(("Inconsistent Stream Count. Flush between consecutive read and write", stream->_cnt >= 0));
                  stream.set_flags(_IOERROR);
                  return (total_bytes - remaining_bytes) / element_size;
              }
      

      Since the crtDebugCallback() function also calls log(), we actually end up hanging because the main thread calling print() is holding the ShellConsoleAppender::mx mutex and waiting for the critical section on std::cout, but the ProgramRunner read thread which triggered the callback is holding the critical section on std::cout and is blocked trying to acquire the ShellConsoleAppender::mx mutex.


      [js_test:repro_interleaved_output] 2019-08-14T20:56:14.922+0100 main shell: xxx
      [js_test:repro_interleaved_output] 2019-08-14T20:56:14.922+0100 sh2676| inside parallel shell: xxxmain shell: xxx
      [js_test:repro_interleaved_output] 2019-08-14T20:56:14.923+0100 xxx
      [js_test:repro_interleaved_output] 2019-08-14T20:56:14.923+0100 main shell: xxx
      [js_test:repro_interleaved_output] 2019-08-14T20:56:14.923+0100 main shell: xxx
      

            Assignee:
            gabriel.russell@mongodb.com Gabriel Russell (Inactive)
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: