Uploaded image for project: 'Node.js Driver'
  1. Node.js Driver
  2. NODE-6611

Spike: Optimize for decoding large latin strings from BSON

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: BSON, Performance
    • Not Needed
    • None
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?
    • None
    • None
    • None
    • None
    • None
    • None

      Use Case

      As a... BSON user
      I want... to deserialize large BSON latin strings quickly to avoid the overhead of per byte utf translations
      So that... I can speed up my application

      An idea from anna.henningsen@mongodb.com: Try viewing all of a BSON document as a string, find the beginning and end of strings within the view to speed up fetching long latin strings from documents.

      User Experience

      • What is the desired/expected outcome for the user once this ticket is implemented?
        • Long strings are parsed faster

      Dependencies

      • upstream and/or downstream requirements and timelines to bear in mind
        • None

      Risks/Unknowns

      • What could go wrong while implementing this change? (e.g., performance, inadvertent behavioral changes in adjacent functionality, existing tech debt, etc)
        • Care must be taken to not misinterpret multibyte utf8 sequences
        • Structurally the current deserializer may be difficult to work within given the recursive implementation. Refactors may be necessary in order to share the string view with the whole decoding process.
      • Is there an opportunity for better cross-driver alignment or testing in this area?
        • Possibly, if the performance improves we should share the approach with others if it is possible in their language, not necessarily something for the specs.
      • Is there an opportunity to improve existing documentation on this subject?
        • No

      Acceptance Criteria

      Implementation Requirements

      • Attempt:
        • viewing BSON document bytes as a JS string
        • determine the offsets of a string start and end and take slices from that string as you parse the BSON
        • validate the string does not contain multibyte sequences

      Testing Requirements

      • Check for correctness
      • If a performance test does not exist for long strings add one to main first

      Documentation Requirements

      • None

      Follow Up Requirements

      • additional tickets to file, required releases, etc
      • if node behavior differs/will differ from other drivers, confirm with dbx devs what standard to aim for and what plan, if any, exists to reconcile the diverging behavior moving forward
        • Are there additional optimizations if the string only contains a small amount of multibyte characters?

            Assignee:
            Unassigned Unassigned
            Reporter:
            neal.beeken@mongodb.com Neal Beeken
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              None
              None
              None
              None