Uploaded image for project: 'Go Driver'
  1. Go Driver
  2. GODRIVER-3090

Optimize logging truncation for large documents

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Unknown Unknown
    • 2.1.0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Not Needed
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

      Context

      The pymongo team has noted that truncating large documents (e.g. 16mb) results in poor driver performance when logging. Fort the Go Driver, this could be particularly underperforming when we try to truncate a large document sent to the server: https://github.com/mongodb/mongo-go-driver/blob/v1/x/mongo/driver/operation.go#L1931

      For a 16mb document:

      BenchmarkRawString/mixed_struct-10                     4         264041604 ns/op        1720600552 B/op       30 allocs/op
      

      v. 128 bytes:

      BenchmarkRawString/mixed_struct-10                227014              5136 ns/op           15139 B/op         24 allocs/op
      

      We should attempt to optimize truncating extended JSON.

      Definition of done

      Create a StringN method for bsoncore.Document that will stringify a document upto N bytes. A POC would look something like this:

      func (d Document) StringN(n int) string {
      	if len(d) < 5 {
      		return ""
      	}
      	var buf strings.Builder
      	buf.WriteByte('{')
      
      	length, rem, _ := ReadLength(d) // We know we have enough bytes to read the length
      
      	length -= 4
      
      	var elem Element
      	var ok bool
      
      	first := true
      
      	// Need to account for 1 terminal byte, and the number of bytes already
      	// written.
      	n -= buf.Len() + 1
      	if n > 0 {
      		for length > 1 {
      			if !first {
      				buf.WriteByte(',')
      			}
      			elem, rem, ok = ReadElement(rem)
      			length -= int32(len(elem))
      			if !ok {
      				return ""
      			}
      
      			str := elem.String()
      			if buf.Len()+len(str) > n {
      				break
      			}
      
      			buf.WriteString(str)
      			first = false
      		}
      	}
      
      	if buf.Len()+1 <= n {
      		buf.WriteByte('}')
      	}
      
      	return buf.String()
      }
      

      This truncation logic must be exact and account for multi-byte characters. This can be baked into the above method using the existing logging truncation algorithm:

      func truncate(str string, width uint) string {
      	if width == 0 {
      		return ""
      	}
      
      	if len(str) <= int(width) {
      		return str
      	}
      
      	// Truncate the byte slice of the string to the given width.
      	newStr := str[:width]
      
      	// Check if the last byte is at the beginning of a multi-byte character.
      	// If it is, then remove the last byte.
      	if newStr[len(newStr)-1]&0xC0 == 0xC0 {
      		return newStr[:len(newStr)-1] + TruncationSuffix
      	}
      
      	// Check if the last byte is in the middle of a multi-byte character. If
      	// it is, then step back until we find the beginning of the character.
      	if newStr[len(newStr)-1]&0xC0 == 0x80 {
      		for i := len(newStr) - 1; i >= 0; i-- {
      			if newStr[i]&0xC0 == 0xC0 {
      				return newStr[:i] + TruncationSuffix
      			}
      		}
      	}
      
      	return newStr + TruncationSuffix
      }
      

      Note that we will also want to update other element String() methods such as Array.StringN and Value.StringN, just in the case an array, for example, makes up the bulk of a large document.

            Assignee:
            timothy.kim@mongodb.com Timothy Kim (Inactive)
            Reporter:
            preston.vasquez@mongodb.com Preston Vasquez
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: