-
Type:
Improvement
-
Resolution: Fixed
-
Priority:
Unknown
-
Affects Version/s: None
-
Component/s: None
-
None
-
Not Needed
-
Context
The pymongo team has noted that truncating large documents (e.g. 16mb) results in poor driver performance when logging. Fort the Go Driver, this could be particularly underperforming when we try to truncate a large document sent to the server: https://github.com/mongodb/mongo-go-driver/blob/v1/x/mongo/driver/operation.go#L1931
For a 16mb document:
BenchmarkRawString/mixed_struct-10 4 264041604 ns/op 1720600552 B/op 30 allocs/op
v. 128 bytes:
BenchmarkRawString/mixed_struct-10 227014 5136 ns/op 15139 B/op 24 allocs/op
We should attempt to optimize truncating extended JSON.
Definition of done
Create a StringN method for bsoncore.Document that will stringify a document upto N bytes. A POC would look something like this:
func (d Document) StringN(n int) string { if len(d) < 5 { return "" } var buf strings.Builder buf.WriteByte('{') length, rem, _ := ReadLength(d) // We know we have enough bytes to read the length length -= 4 var elem Element var ok bool first := true // Need to account for 1 terminal byte, and the number of bytes already // written. n -= buf.Len() + 1 if n > 0 { for length > 1 { if !first { buf.WriteByte(',') } elem, rem, ok = ReadElement(rem) length -= int32(len(elem)) if !ok { return "" } str := elem.String() if buf.Len()+len(str) > n { break } buf.WriteString(str) first = false } } if buf.Len()+1 <= n { buf.WriteByte('}') } return buf.String() }
This truncation logic must be exact and account for multi-byte characters. This can be baked into the above method using the existing logging truncation algorithm:
func truncate(str string, width uint) string { if width == 0 { return "" } if len(str) <= int(width) { return str } // Truncate the byte slice of the string to the given width. newStr := str[:width] // Check if the last byte is at the beginning of a multi-byte character. // If it is, then remove the last byte. if newStr[len(newStr)-1]&0xC0 == 0xC0 { return newStr[:len(newStr)-1] + TruncationSuffix } // Check if the last byte is in the middle of a multi-byte character. If // it is, then step back until we find the beginning of the character. if newStr[len(newStr)-1]&0xC0 == 0x80 { for i := len(newStr) - 1; i >= 0; i-- { if newStr[i]&0xC0 == 0xC0 { return newStr[:i] + TruncationSuffix } } } return newStr + TruncationSuffix }
Note that we will also want to update other element String() methods such as Array.StringN and Value.StringN, just in the case an array, for example, makes up the bulk of a large document.
- is duplicated by
-
GODRIVER-3270 mongo-go-driver - PR #1699: GODRIVER-3090 Optimize logging truncation for large documents
-
- Closed
-
- is related to
-
GODRIVER-3281 mongo-go-driver - PR #1699: GODRIVER-3090 Optimize logging truncation for large documents
-
- Closed
-