Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-78301

Consider making bulkWrite base command size estimation on mongos more efficient

    • Type: Icon: Task Task
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Replication

      In SERVER-73536, we went with a naive implementation to estimate the size of a bulkWrite command (excluding its ops), where we just serialize a command object with fields copied over and placeholders added as needed, and take the size of that.
      Our rationale for this was that:

      • we only do it once up-front per bulkWrite command mongos receives
      • for most bulkWrite commands, we expect the ops field (which we skip serializing here) to take up the bulk of the command
      • this is strictly less expensive than serializing an actual sub-batch command, which is something we often do numerous times for a single incoming request on mongos that targets multiple shards.

      That said, for certain workloads (e.g. all writes are to a single shard so we won't split batches often, and/or there are large top-level fields on the command) this could prove costly.

      when we do performance testing, it may be worth reevaluating this. A smarter implementation could do math to try to estimate the size without actually serializing the data, similar to what we do for estimating the sizes of individual ops.

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            kaitlin.mahar@mongodb.com Kaitlin Mahar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: