[SERVER-19813] resmoke: UnicodeDecodeError when POSTing to buildlogger Created: 06/Aug/15  Updated: 27/Oct/15  Resolved: 16/Oct/15

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: 3.1.6
Fix Version/s: 3.2.0-rc1

Type: Bug Priority: Major - P3
Reporter: Kamran K. Assignee: Max Hirschhorn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-7506 RamLog and standard log should preser... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: TIG B (10/30/15)
Participants:

 Description   

Example failure:

 [2015/08/06 09:25:01.173] [buildlogger:fallback] Encountered an error.
 [2015/08/06 09:25:01.174] Traceback (most recent call last):
 [2015/08/06 09:25:01.174]   File "C:\data\mci\src\buildscripts\resmokelib\logging\buildlogger.py", line 39, in wrapper
 [2015/08/06 09:25:01.174]     return func(*args, **kwargs)
 [2015/08/06 09:25:01.174]   File "C:\data\mci\src\buildscripts\resmokelib\logging\buildlogger.py", line 241, in _append_logs
 [2015/08/06 09:25:01.174]     response = self.post(endpoint, data=log_lines)
 [2015/08/06 09:25:01.174]   File "C:\data\mci\src\buildscripts\resmokelib\logging\buildlogger.py", line 181, in post
 [2015/08/06 09:25:01.174]     return self.http_handler.post(*args, **kwargs)
 [2015/08/06 09:25:01.174]   File "C:\data\mci\src\buildscripts\resmokelib\logging\handlers.py", line 162, in post
 [2015/08/06 09:25:01.174]     data = json.dumps(data, encoding="utf-8")
 [2015/08/06 09:25:01.174]   File "C:\Python27\lib\json\__init__.py", line 231, in dumps
 [2015/08/06 09:25:01.174]     return _default_encoder.encode(obj)
 [2015/08/06 09:25:01.174]   File "C:\Python27\lib\json\encoder.py", line 201, in encode
 [2015/08/06 09:25:01.174]     chunks = self.iterencode(o, _one_shot=True)
 [2015/08/06 09:25:01.174]   File "C:\Python27\lib\json\encoder.py", line 264, in iterencode
 [2015/08/06 09:25:01.174]     return _iterencode(o, 0)
 [2015/08/06 09:25:01.174] UnicodeDecodeError: 'utf8' codec can't decode bytes in position 341-342: invalid continuation byte
 [2015/08/06 09:25:01.174] [buildlogger:fallback] [js_test:fsm_all_sharded_replication_legacy_config_servers] 2015-08-06T13:24:50.957+0000  m30999| 2015-08-06T13:24:50.957+0000 I SHARDING [conn1] ChunkManager: time to load chunks for db27.coll27: 0ms sequenceNumber: 128 version: 1|1||55c3602262805a927bec32e4 based on: (empty)
 [2015/08/06 09:25:01.174] [buildlogger:fallback] [js_test:fsm_all_sharded_replication_legacy_config_servers] 2015-08-06T13:24:51.046+0000  m31100| 2015-08-06T13:24:50.958+0000 I SHARDING [conn55] remotely refreshing metadata for db27.coll27 with requested shard version 1|1||55c3602262805a927bec32e4, current shard version is 0|0||000000000000000000000000, current metadata version is 0|0||000000000000000000000000
 [2015/08/06 09:25:01.174] [buildlogger:fallback] [js_test:fsm_all_sharded_replication_legacy_config_servers] 2015-08-06T13:24:51.046+0000  m31100| 2015-08-06T13:24:50.958+0000 I SHARDING [conn55] collection db27.coll27 was previously unsharded, new metadata loaded with shard version 1|1||55c3602262805a927bec32e4
 [2015/08/06 09:25:01.174] [buildlogger:fallback] [js_test:fsm_all_sharded_replication_legacy_config_servers] 2015-08-06T13:24:51.046+0000  m31100| 2015-08-06T13:24:50.958+0000 I SHARDING [conn55] collection version was loaded at version 1|1||55c3602262805a927bec32e4, took 0ms
 [2015/08/06 09:25:01.174] [buildlogger:fallback] [js_test:fsm_all_sharded_replication_legacy_config_servers] 2015-08-06T13:24:51.046+0000  m30999| 2015-08-06T13:24:50.959+0000 I SHARDING [conn1] about to log metadata event: { _id: "WIN-V15L7UKQ176-2015-08-06T13:24:50.959+0000-55c3602262805a927bec32e5", server: "WIN-V15L7UKQ176", clientAddr: "127.0.0.1:50705", time: new Date(1438867490959), what: "shardCollection", ns: "db27.coll27", details: { version: "1|1||55c3602262805a927bec32e4" } }
 [2015/08/06 09:25:01.174] [buildlogger:fallback] [js_test:fsm_all_sharded_replication_legacy_config_servers] 2015-08-06T13:24:51.121+0000  m30999| 2015-08-06T13:24:51.122+0000 I SHARDING [conn1] distributed lock 'db27.coll27/WIN-V15L7UKQ176:30999:1438867184:41' unlocked.
 [2015/08/06 09:25:01.174] [buildlogger:fallback] [js_test:fsm_all_sharded_replication_legacy_config_servers] 2015-08-06T13:24:51.122+0000  m30999| 2015-08-06T13:24:51.122+0000 I SHARDING [conn1] moving chunk ns: db27.coll27 moving ( ns: db27.coll27, shard: test-rs0, lastmod: 1|1||000000000000000000000000, min: { _id: 0 }, max: { _id: MaxKey }) test-rs0 -> test-rs1
 [2015/08/06 09:25:01.174] [buildlogger:fallback] [js_test:fsm_all_sharded_replication_legacy_config_servers] 2015-08-06T13:24:51.122+0000  m31100| 2015-08-06T13:24:51.123+0000 I SHARDING [conn79] moveChunk waiting for full cleanup after move
...

https://evergreen.mongodb.com/task/mongodb_mongo_master_windows_64_2k8_debug_concurrency_sharded_3ac6afd27ae0beda8da6f999bf6867e474af3ce2_15_08_06_12_19_00#/log/ALL



 Comments   
Comment by Githook User [ 16/Oct/15 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-19813 Handle invalid UTF-8 sequences in subprocess output.
Branch: master
https://github.com/mongodb/mongo/commit/9be7e812e01ed9b37ad8105a597e7befe205e194

Comment by Max Hirschhorn [ 16/Oct/15 ]

The cause of this issue is SERVER-7506: log messages of MongoDB processes are not necessarily valid UTF-8 sequences. I was able to more easily reproduce the error by running with the following patch on Evergreen:

diff --git a/buildscripts/resmokeconfig/suites/concurrency.yml b/buildscripts/resmokeconfig/suites/concurrency.yml
index 52af5e9..dcab3be 100644
--- a/buildscripts/resmokeconfig/suites/concurrency.yml
+++ b/buildscripts/resmokeconfig/suites/concurrency.yml
@@ -18,5 +18,6 @@ executor:
     fixture:
       class: MongoDFixture
       mongod_options:
+        slowms: 1
         set_parameters:
           enableTestCommands: 1
diff --git a/jstests/concurrency/fsm_all.js b/jstests/concurrency/fsm_all.js
index c174342..1af22d0 100644
--- a/jstests/concurrency/fsm_all.js
+++ b/jstests/concurrency/fsm_all.js
@@ -8,5 +8,5 @@ var blacklist = [
 ].map(function(file) { return dir + '/' + file; });
 
 runWorkloadsSerially(ls(dir).filter(function(file) {
-    return !Array.contains(blacklist, file);
+    return !Array.contains(blacklist, file) && file.indexOf('indexed_insert_text_multikey.js') >= 0;
 }));

The indexed_insert_text.js FSM workload defines a set of text snippets that contain non-ASCII characters. Queries using those snippets are potentially truncated on a byte level that causes the logs to contain invalid UTF-8 sequences. resmoke.py should handle this by converting the bytestring to a UTF-8 string and replacing any characters that cannot be decoded with U+FFFD.

Generated at Thu Feb 08 03:52:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.