[GODRIVER-2311] Byte array reuse in BSON unmarshalling may cause duplicated values Created: 14/Feb/22  Updated: 28/Oct/23  Resolved: 16/Mar/22

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: 1.0.4
Fix Version/s: 1.9.0, 1.8.5, 1.7.6

Type: Bug Priority: Blocker - P1
Reporter: Matt Dale Assignee: Matt Dale
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Cloud Backport: Needed

 Description   

Updated:
Confirmed this is a bug; conditions that will trigger this bug are:

  1. Load a BSON document into a []byte (e.g. from a file, a server response, etc.)
  2. Unmarshal the BSON document into any type that contains a []byte field, like a user-defined struct or bson.D.
  3. Modify the bytes in the input []byte.
  4. Observe that the contents of the []byte field in the unmarshaled value changed.

Check out a repro example here (note that the example doesn't repro the problem on the Go Playground as of the 1.8.5/1.9.0 releases, which fix the bug): https://go.dev/play/p/-BjGJ9OjAVB

Note that this only applies to unmarshaling into byte slice values, not byte array values. For example, values unmarshaled to a struct containing a [16]byte field are not affected. However, the same BSON document unmarshaled to a bson.D will infer the value type is a []byte and will be affected.

Original:
Some users of a mongopush fork are having issues with duplication of UUIDs in unmarshalled values. Specifically, when reading an oplog written to a file here, some UUID fields in the unmarshalled values can be duplicated.

The duplication bug is fixed by this commit which makes a copy of the input file byte buffer. That fix suggesting the root cause of the issue may be some input byte array reuse in the returned value (i.e. the BSON Unmarshal function returns an unmarshalled value with byte slices that point to sections of the same byte array as the input data). That can lead to unexpected value duplication or corruption if the input byte array is modified after unmarshalling a value (modifying the input byte slice/array after Unmarshal returns is a valid use case).

Try to detect the possible issue using the following process:

  1. Create a set of input structs containing different value types, including byte slice types (e.g. []byte, uuid.UUID, etc).
  2. Marshal each input struct value as BSON.
  3. Record the low and high addresses of the output byte slice and underlying array.
  4. Unmarshal the marshalled bytes into a bson.D.
  5. Record the low and high addresses of the byte slice-type values in the unmarshalled bson.D.
  6. Check if any of the byte slice-type value addresses are in the memory address range of the input byte slice/array.

E.g. getting underlying array addresses of a slice:

s := []byte{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
 
sh := (*reflect.SliceHeader)(unsafe.Pointer(&s))
 
fmt.Println("Low address", sh.Data)
fmt.Println("High address", sh.Data+uintptr(sh.Cap-1)



 Comments   
Comment by Githook User [ 04/Apr/22 ]

Author:

{'name': 'Kevin Albertson', 'email': 'kevin.albertson@mongodb.com', 'username': 'kevinAlbs'}

Message: GODRIVER-2311 Ensure unmarshaled BSON values always use distinct unde… (#892)

Co-authored-by: Matt Dale <9760375+matthewdale@users.noreply.github.com>
Branch: release/1.7
https://github.com/mongodb/mongo-go-driver/commit/8e61246c0fc22225809faa3f7115a2d89e16a389

Comment by Githook User [ 31/Mar/22 ]

Author:

{'name': 'Matt Dale', 'email': '9760375+matthewdale@users.noreply.github.com', 'username': 'matthewdale'}

Message: GODRIVER-2311 Improve the BSON unmarshal buffer reuse fix to reduce memory allocations. (#891)
Branch: release/1.8
https://github.com/mongodb/mongo-go-driver/commit/dac4668295fe7d553db2374530cbcec973682e21

Comment by Githook User [ 31/Mar/22 ]

Author:

{'name': 'Matt Dale', 'email': '9760375+matthewdale@users.noreply.github.com', 'username': 'matthewdale'}

Message: GODRIVER-2311 Improve the BSON unmarshal buffer reuse fix to reduce memory allocations. (#891)
Branch: master
https://github.com/mongodb/mongo-go-driver/commit/5970415b5bdd318697fc75b2af93a7f724a44e89

Comment by Githook User [ 22/Mar/22 ]

Author:

{'name': 'Matt Dale', 'email': '9760375+matthewdale@users.noreply.github.com', 'username': 'matthewdale'}

Message: GODRIVER-2311 Ensure unmarshaled BSON values always use distinct underlying byte arrays. (#874)
Branch: release/1.8
https://github.com/mongodb/mongo-go-driver/commit/1bef05bfac11ae8fe7632e873fd7a91dbbf4da9d

Comment by Githook User [ 16/Mar/22 ]

Author:

{'name': 'Matt Dale', 'email': '9760375+matthewdale@users.noreply.github.com', 'username': 'matthewdale'}

Message: GODRIVER-2311 Ensure unmarshaled BSON values always use distinct underlying byte arrays. (#874)
Branch: master
https://github.com/mongodb/mongo-go-driver/commit/d307af82c6ed70c51fd4576f87f128479c82ada0

Comment by Matt Dale [ 14/Mar/22 ]

PR: https://github.com/mongodb/mongo-go-driver/pull/874

Generated at Thu Feb 08 08:38:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.