[GODRIVER-3090] Optimize logging truncation for large documents Created: 05/Jan/24  Updated: 12/Jan/24

Status: Backlog
Project: Go Driver
Component/s: None
Affects Version/s: None
Fix Version/s: 2.1.0

Type: Improvement Priority: Unknown
Reporter: Preston Vasquez Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Documentation Changes Summary:

1. What would you like to communicate to the user about this feature?
2. Would you like the user to see examples of the syntax and/or executable code and its output?
3. Which versions of the driver/connector does this apply to?


 Description   

Context

The pymongo team has noted that truncating large documents (e.g. 16mb) results in poor driver performance when logging. Fort the Go Driver, this could be particularly underperforming when we try to truncate a large document sent to the server: https://github.com/mongodb/mongo-go-driver/blob/v1/x/mongo/driver/operation.go#L1931

For a 16mb document:

BenchmarkRawString/mixed_struct-10                     4         264041604 ns/op        1720600552 B/op       30 allocs/op

v. 128 bytes:

BenchmarkRawString/mixed_struct-10                227014              5136 ns/op           15139 B/op         24 allocs/op

We should attempt to optimize truncating extended JSON.

Definition of done

Create a StringN method for bsoncore.Document that will stringify a document upto N bytes. A POC would look something like this:

func (d Document) StringN(n int) string {
	if len(d) < 5 {
		return ""
	}
	var buf strings.Builder
	buf.WriteByte('{')
 
	length, rem, _ := ReadLength(d) // We know we have enough bytes to read the length
 
	length -= 4
 
	var elem Element
	var ok bool
 
	first := true
 
	// Need to account for 1 terminal byte, and the number of bytes already
	// written.
	n -= buf.Len() + 1
	if n > 0 {
		for length > 1 {
			if !first {
				buf.WriteByte(',')
			}
			elem, rem, ok = ReadElement(rem)
			length -= int32(len(elem))
			if !ok {
				return ""
			}
 
			str := elem.String()
			if buf.Len()+len(str) > n {
				break
			}
 
			buf.WriteString(str)
			first = false
		}
	}
 
	if buf.Len()+1 <= n {
		buf.WriteByte('}')
	}
 
	return buf.String()
}

This truncation logic must be exact and account for multi-byte characters. This can be baked into the above method using the existing logging truncation algorithm:

func truncate(str string, width uint) string {
	if width == 0 {
		return ""
	}
 
	if len(str) <= int(width) {
		return str
	}
 
	// Truncate the byte slice of the string to the given width.
	newStr := str[:width]
 
	// Check if the last byte is at the beginning of a multi-byte character.
	// If it is, then remove the last byte.
	if newStr[len(newStr)-1]&0xC0 == 0xC0 {
		return newStr[:len(newStr)-1] + TruncationSuffix
	}
 
	// Check if the last byte is in the middle of a multi-byte character. If
	// it is, then step back until we find the beginning of the character.
	if newStr[len(newStr)-1]&0xC0 == 0x80 {
		for i := len(newStr) - 1; i >= 0; i-- {
			if newStr[i]&0xC0 == 0xC0 {
				return newStr[:i] + TruncationSuffix
			}
		}
	}
 
	return newStr + TruncationSuffix
}

Note that we will also want to update other element String() methods such as Array.StringN and Value.StringN, just in the case an array, for example, makes up the bulk of a large document.


Generated at Thu Feb 08 08:40:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.