- 
    Type:Improvement 
- 
    Resolution: Fixed
- 
    Priority:Major - P3 
- 
    Affects Version/s: None
- 
    Component/s: Evergreen
- 
        None
- 
        Fully Compatible
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
Evergreen log output currently has two issues that make it difficult to debug failures in Evergreen:
1. Standard error and standard output streams are seemingly interleaved in a way that does not respect causality, in that failure output of an operation may be logged prior to a log entry announcing that the operation is about to be performed (because failure and announcement are reported through different streams). See https://jira.mongodb.org/browse/EVG-3681
2. The timestamps that Evergreen provides for command output appear to be batched and can be off by many seconds. For example, the following is an excerpt of output showing Evergreen-reported timestamps on the left and actual timestamps on the right:
[P: 40] [2020/03/19 19:05:40.864] [egos:O 2020-03-19T19:05:40.863336] D, [2020-03-19T19:05:40.818927 #7446] DEBUG -- : MONGODB | [11] localhost:27017 #1 | admin.saslStart | STARTED | {}
[P: 40] [2020/03/19 19:05:40.864] [egos:O 2020-03-19T19:05:40.863383] D, [2020-03-19T19:05:40.819759 #7446] DEBUG -- : MONGODB | [11] localhost:27017 | admin.saslStart | SUCCEEDED | 0.001s
[P: 40] [2020/03/19 19:05:40.864] [egos:O 2020-03-19T19:05:40.863429] D, [2020-03-19T19:05:40.825840 #7446] DEBUG -- : MONGODB | [12] localhost:27017 #1 | admin.saslContinue | STARTED | {}
[P: 40] [2020/03/19 19:05:40.864] [egos:O 2020-03-19T19:05:40.863494] D, [2020-03-19T19:05:40.826121 #7446] DEBUG -- : MONGODB | [12] localhost:27017 | admin.saslContinue | SUCCEEDED | 0.000s
[P: 40] [2020/03/19 19:05:40.864] [egos:O 2020-03-19T19:05:40.863551] D, [2020-03-19T19:05:40.826281 #7446] DEBUG -- : MONGODB | [13] localhost:27017 #1 | admin.saslContinue | STARTED | {}
[P: 40] [2020/03/19 19:05:40.864] [egos:O 2020-03-19T19:05:40.863604] D, [2020-03-19T19:05:40.826533 #7446] DEBUG -- : MONGODB | [13] localhost:27017 | admin.saslContinue | SUCCEEDED | 0.000s
[P: 40] [2020/03/19 19:05:40.864] [egos:E 2020-03-19T19:05:40.863682] + test standalone = sharded-cluster
[P: 40] [2020/03/19 19:05:40.864] [egos:O 2020-03-19T19:05:40.863761] D, [2020-03-19T19:05:40.826725 #7446] DEBUG -- : MONGODB | [14] localhost:27017 #1 | admin.usersInfo | STARTED | {"usersInfo"=>"bob", "$db"=>"admin", "lsid"=>{"id"=><BSON::Binary:0x4600 type=uuid data=0x067b385183414215...>}}
[P: 40] [2020/03/19 19:05:41.696] [egos:E 2020-03-19T19:05:40.863828] + export 'MONGODB_URI=mongodb://bob:pwd123@localhost:27017/?appName=test-suite&tlsCAFile=spec/support/certificates/ca.crt&tlsCertificateKeyFile=spec/support/certificates/client-second-level-bundle.pem'
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:40.863850] D, [2020-03-19T19:05:40.827301 #7446] DEBUG -- : MONGODB | [14] localhost:27017 | admin.usersInfo | SUCCEEDED | 0.000s
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:40.863895] D, [2020-03-19T19:05:40.828653 #7446] DEBUG -- : MONGODB | [16] localhost:27017 #1 | ruby-driver.createUser | STARTED | {}
[P: 40] [2020/03/19 19:05:41.696] [egos:E 2020-03-19T19:05:40.863916] + MONGODB_URI='mongodb://bob:pwd123@localhost:27017/?appName=test-suite&tlsCAFile=spec/support/certificates/ca.crt&tlsCertificateKeyFile=spec/support/certificates/client-second-level-bundle.pem'
[P: 40] [2020/03/19 19:05:41.696] [egos:E 2020-03-19T19:05:40.863929] + test -n ''
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:40.863942] D, [2020-03-19T19:05:40.857719 #7446] DEBUG -- : MONGODB | [16] localhost:27017 | ruby-driver.createUser | SUCCEEDED | 0.029s
[P: 40] [2020/03/19 19:05:41.696] [egos:E 2020-03-19T19:05:40.864058] + bundle exec rake spec:ci
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:40.864128] D, [2020-03-19T19:05:40.858018 #7446] DEBUG -- : MONGODB | [17] localhost:27017 #1 | admin.endSessions | STARTED | {"endSessions"=>[{"id"=><BSON::Binary:0x4600 type=uuid data=0x067b385183414215...>}], "$db"=>"admin"}
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:40.864189] D, [2020-03-19T19:05:40.858238 #7446] DEBUG -- : MONGODB | [17] localhost:27017 | admin.endSessions | SUCCEEDED | 0.000s
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:40.864264] D, [2020-03-19T19:05:40.858719 #7446] DEBUG -- : MONGODB | EVENT: #<ServerClosed address=localhost:27017 topology=Single[localhost:27017]>
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:40.864314] D, [2020-03-19T19:05:40.858757 #7446] DEBUG -- : MONGODB | Server localhost:27017 connection closed.
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:40.864358] D, [2020-03-19T19:05:40.858793 #7446] DEBUG -- : MONGODB | EVENT: #<TopologyClosed topology=Single[localhost:27017]>
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:40.864409] D, [2020-03-19T19:05:40.858816 #7446] DEBUG -- : MONGODB | Topology type 'single' closed.
[P: 40] [2020/03/19 19:05:41.696] [egos:E 2020-03-19T19:05:41.111023] [mongo] Warning: No private key present, creating unsigned gem.
[P: 40] [2020/03/19 19:05:41.696] [egos:E 2020-03-19T19:05:41.632811] Skipping root user creation, likely auth is enabled on cluster
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:41.694394] D, [2020-03-19T19:05:41.596337 #7457] DEBUG -- : MONGODB | EVENT: #<TopologyOpening topology=Unknown[]>
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:41.694472] D, [2020-03-19T19:05:41.596393 #7457] DEBUG -- : MONGODB | Topology type 'unknown' initializing.
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:41.694552] D, [2020-03-19T19:05:41.596557 #7457] DEBUG -- : MONGODB | EVENT: #<TopologyChanged prev=Unknown[] new=Unknown[localhost:27017]>
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:41.694579] D, [2020-03-19T19:05:41.596588 #7457] DEBUG -- : MONGODB | Topology type 'Unknown' changed to type 'Unknown'.
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:41.694611] D, [2020-03-19T19:05:41.596712 #7457] DEBUG -- : MONGODB | EVENT: #<ServerOpening address=localhost:27017 topology=Unknown[localhost:27017]>
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:41.694632] D, [2020-03-19T19:05:41.596738 #7457] DEBUG -- : MONGODB | Server localhost:27017 initializing.
In this segment, Evergreen appears to have:
- Inserted a large reported time gap when none existed in reality (change from 40 to 41 seconds on the left)
- Masked a time gap that existed in reality (change from 40 to 41 seconds on the right)
In the following excerpt, "topology type 'single' closed" is reported as appearing 30 seconds off its actual time:
[P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:41.696294] D, [2020-03-19T19:05:41.694298 #7457] DEBUG -- : MONGODB | EVENT: #<ServerClosed address=localhost:27017 topology=Single[localhost:27017]> [P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:41.696317] D, [2020-03-19T19:05:41.694327 #7457] DEBUG -- : MONGODB | Server localhost:27017 connection closed. [P: 40] [2020/03/19 19:05:41.696] [egos:O 2020-03-19T19:05:41.696343] D, [2020-03-19T19:05:41.694375 #7457] DEBUG -- : MONGODB | EVENT: #<TopologyClosed topology=Single[localhost:27017]> [P: 40] [2020/03/19 19:06:15.228] [egos:O 2020-03-19T19:05:41.696363] D, [2020-03-19T19:05:41.694397 #7457] DEBUG -- : MONGODB | Topology type 'single' closed. [P: 40] [2020/03/19 19:06:15.228] [egos:O 2020-03-19T19:05:41.696461] /data/mci/01a9c12012377d3f4190ab526f033149/src/rubies/ruby-2.7.0/bin/ruby -I/data/mci/01a9c12012377d3f4190ab526f033149/src/rubies/ruby-2.7.0/lib/ruby/gems/2.7.0/gems/rspec-core-3.9.1/lib:/data/mci/01a9c12012377d3f4190ab526f033149/src/rubies/ruby-2.7.0/lib/ruby/gems/2.7.0/gems/rspec-support-3.9.2/lib /data/mci/01a9c12012377d3f4190ab526f033149/src/rubies/ruby-2.7.0/lib/ruby/gems/2.7.0/gems/rspec-core-3.9.1/exe/rspec --pattern spec/\*\*\{,/\*/\*\*\}/\*_spec.rb [P: 40] [2020/03/19 19:06:15.228] [egos:E 2020-03-19T19:05:41.810759] [mongo] Warning: No private key present, creating unsigned gem.
I propose working around this issue by wrapping all our test commands with a script which:
1. Reads stdout and stderr of the wrapped process, and timestamps the lines read from those streams based on arrival time (hopefully this is very close to when they were produced);
2. Produces a single output stream combining both stdout and stderr of the wrapped program. In this output stream, each line is specially marked to identify whether it originally came from stdout or stderr, and the time of original receipt of each line is added.
The wrapper would therefore present a single stream to Evergreen, bypassing the issue of log output being reordered, and would provide accurate times for each log line such that Evergreen's own timestamps could be ignored henceforth.