[CDRIVER-2453] Invalid bson returned in bulk operation reply in some cases Created: 05/Jan/18  Updated: 15/Dec/21  Resolved: 05/Jan/18

Status: Closed
Project: C Driver
Component/s: libmongoc
Affects Version/s: 1.8.0, 1.9.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Thijs Cadier Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Seen on Ubuntu 14.04 64 bit and Mac OS Sierra.


Issue Links:
Duplicate
duplicates SERVER-24007 Server can return invalid UTF8 for er... Backlog
Related
is related to DRIVERS-2008 Default to lossy/replacement behavior... Backlog

 Description   

When an document inserted in a bulk operation is invalid bson and also contains a certain UTF-8 string the bulk operation reply contains incorrectly truncated UTF-8 data. It looks like it can truncate in the middle of a code point. The returned bson is invalid in that case.

I have a bson file to reliably reproduce this. It contains some customer data so I don't want to post it publicly, but it is not sensitive data so I'm happy to email it to you.



 Comments   
Comment by Thijs Cadier [ 05/Jan/18 ]

Looks like that's it indeed, thanks.

Comment by A. Jesse Jiryu Davis [ 05/Jan/18 ]

This is SERVER-24007. The BSON file you shared with me appears to be valid, to me. It includes a very long UTF-8 _id value. When I insert the BSON twice, the server returns an error message like:

> dup key: "first 100 characters of long _id..."

The point where the server trims the _id value in order to include it in the error message is the 128th byte (or something like that), but the server doesn't notice that it has trimmed the _id in the middle of a multibyte UTF-8 character.

I'm going to close this ticket as a duplicate, feel free to reopen if I've made a mistake.

Comment by A. Jesse Jiryu Davis [ 05/Jan/18 ]

Received. (I thought I could share this attachment as a private attachment to the Jira ticket, but Jira doesn't have this feature so I deleted the attachment again.)

Comment by Thijs Cadier [ 05/Jan/18 ]

You have to insert it into the same collection twice by the way, the second time there will be a duplicate key error and this bug will trigger.

Comment by Thijs Cadier [ 05/Jan/18 ]

Done

Comment by A. Jesse Jiryu Davis [ 05/Jan/18 ]

Oh, interesting. Sure, email me the file.

Comment by Thijs Cadier [ 05/Jan/18 ]

This is example code using the Rust wrapper around libmongoc to trigger this bug:

let uri            = Uri::new("mongodb://localhost:27017/").unwrap();
let pool           = ClientPool::new(uri, None);
let client         = pool.pop();
let collection     = client.get_collection("rust_driver_test", "bulk_operation_utf8_from_file");
let bulk_operation = collection.create_bulk_operation(None);
 
let mut file = File::open("~/Desktop/malformed.bson").unwrap();
let document = bson::decode_document(&mut file).unwrap();
bulk_operation.insert(&document).unwrap();
bulk_operation.execute().unwrap();

Comment by Thijs Cadier [ 05/Jan/18 ]

I double checked this and the situation is slightly different: The bson in the bulk operation is actually valid, but the reply returned is invalid bson anyway. Can I send the bson file that triggers this to you?

Comment by Thijs Cadier [ 05/Jan/18 ]

That's a good point, I actually don't know if it's a server or a driver bug. The driver or server correctly rejects the bson. The bug is that the driver then returns a reply that contains invalid bson. In my mind the reply document should always be valid bson. Whether the source of the invalid bson is the driver or the server I cannot tell.

Comment by A. Jesse Jiryu Davis [ 05/Jan/18 ]

Thanks. Do you think this is either expected behavior (inserting invalid BSON results in undefined behavior) or a server bug (it accepts invalid BSON and replies with invalid BSON) or a driver bug (we don't reject the invalid BSON)?

Generated at Wed Feb 07 21:15:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.