[SERVER-46873] Invalid utf-8 in json logs Created: 14/Mar/20  Updated: 29/Oct/23  Resolved: 25/Jan/21

Status: Closed
Project: Core Server
Component/s: Logging
Affects Version/s: 4.3.4
Fix Version/s: 4.4.0

Type: Bug Priority: Major - P3
Reporter: Oleg Pudeyev (Inactive) Assignee: Mark Benvenuto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-46219 Unittests should log with JSON format Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Service arch 2020-05-04, Service arch 2020-05-18, Service arch 2020-06-01, Service arch 2020-06-15, Service arch 2020-06-29, Service arch 2020-07-13, Service Arch 2020-07-27, Service Arch 2020-08-10, Service Arch 2020-08-24, Security 2021-02-08
Participants:

 Description   

In this file: https://s3.amazonaws.com/mciuploads/mongo-ruby-driver/mongo-latest__mongodb-version~latest_topology~replica-set_auth-and-ssl~noauth-and-nossl_ruby~ruby-2.7_os~ubuntu1604/3da63736cf6fd65edb655fe21aa94b970796b802/5e6b12882a60ed16d60d365b/mongo_ruby_driver_mongo_latest__mongodb_version~latest_topology~replica_set_auth_and_ssl~noauth_and_nossl_ruby~ruby_2.7_os~ubuntu1604_patch_3da63736cf6fd65edb655fe21aa94b970796b802_5e6b12882a60ed16d60d365b_20_03_13_04_56_41/logs/mongo_ruby_driver_mongo_latest__mongodb_version~latest_topology~replica_set_auth_and_ssl~noauth_and_nossl_ruby~ruby_2.7_os~ubuntu1604_test_mlaunch_patch_3da63736cf6fd65edb655fe21aa94b970796b802_5e6b12882a60ed16d60d365b_20_03_13_04_56_41-0-mongodb-logs.tar.gz, in drivers-tools/.evergreen/orchestration/db/ruby-driver-rs/rs2/mongod.log , the server produced 4 lines that contain invalid utf-8 as reported by Ruby. The line numbers are 21111, 21962, 22041, 22187. These lines all look like this:

{"t":{"$date":"2020-03-13T05:12:21.172+0000"},"s":"I", "c":"TXN",     "id":51802,"ctx":"conn1036","msg":"transaction","attr":{"parameters":{"lsid":{"id":{"$uuid":"ee279b85-d0db-4273-948c-f3f40327d56a"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}},"txnNumber":1,"autocommit":false,"readConcern":{"provenance":"implicitDefault"}},"readTimestamp":"Timestamp(0, 0)","ninserted":1,"keysInserted":1,"terminationCause":"\ufffd���\ufffdU\u0000\u0000\u0000","timeActive":287,"timeInactive":1011340,"numYields":0,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":3}},"Global":{"acquireCount":{"w":2}},"Database":{"acquireCount":{"w":3}},"Collection":{"acquireCount":{"w":3}},"Mutex":{"acquireCount":{"r":3}}},"storage":{},"wasPrepared":false,"durationMillis":1011}}

terminationCause appears to look like this:

\"terminationCause\":\"\\ufffd\xED\xA9\x9B\\ufffdU\\u0000\\u0000\\u0000\"

Ruby has this to say about it:

(byebug) line.force_encoding('utf-8').encode('utf-16')
*** Encoding::InvalidByteSequenceError Exception: "\xED" followed by "\xA9" on UTF-8

According to https://en.wikipedia.org/wiki/JSON#Data_portability_issues json should be encoded in utf-8.

Build: https://evergreen.mongodb.com/build/mongo_ruby_driver_mongo_latest__mongodb_version~latest_topology~replica_set_auth_and_ssl~noauth_and_nossl_ruby~ruby_2.7_os~ubuntu1604_patch_3da63736cf6fd65edb655fe21aa94b970796b802_5e6b12882a60ed16d60d365b_20_03_13_04_56_41/



 Comments   
Comment by Mark Benvenuto [ 25/Jan/21 ]

As part as SERVER-46219, this issue was fixed. A `std::string` was changed to `StringData` which fixed the lifetime issues. The bogus UTF-8 was a result of reading freed memory.

Comment by Bruce Lucas (Inactive) [ 16/Mar/20 ]

It looks to me that this is worse than simply logging invalid utf8 - it looks like terminationCause is garbage, indicating possibly a more serious problem like use after free, wild pointer, uninitialized storage.

Generated at Thu Feb 08 05:12:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.