Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-82734

Improve capped collection write performance.

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Storage Execution

      While investigatingĀ HELP-51123, I found some potential areas where we can improve the capped collection performance.

      Delete code path
      >>>>>>>>>>

      1) Don't serialize capped deletes on secondaries.

      • Oplog applier currently serializes both capped inserts/updates and deletes . While serializing inserts/updates is necessary for maintaining consistent natural ordering across the replica set, serializing deletes is unnecessary.

      2) Do batched deletion (like, PM-2227) before performing vectored inserts.

      • Currently, we perform batched deletes (doing multiple deletes in a single storage transaction). However, we reserve oplog slots for capped deletes one by one. Given the existing issue with vectored inserts related to timestamp interleaving, this approach could additionally slow down vectored inserts for capped collections, especially since we're waiting for capped deletes to achieve majority replicated.
      • In addition, we're checking the capped collection size and performing deletes after each capped inserts, which may not be the most efficient process.
      • As an optimization, we should consider reserving oplog slots in bulk and performing batched operations before executing vector inserts.

      3) Another optimization to consider is using truncate instead of delete calls for substantial numbers of deletions. In 4.4, we used truncate when the number of documents to be deleted was > 3. Since, we were doing unreplicated implicit deletes in 4.4 and older version, this was simpler to handle corresponding index entry deletes. Since the start of 5.0, we've been using replicated delete, that might impose some challenges with index deletes. But, this may be worth exploring in PM-2983.

      Insert code path
      >>>>>>>>>>>
      1) Enable group inserts on secondaries.

      • Given that we do replicated deletes starting from 5.0, we can enable group inserts

      2) Enable batch inserts on primary (see here and here)

      • This will be safe if we also do "Delete code path" optimization #2.

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            suganthi.mani@mongodb.com Suganthi Mani
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: