Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61587

Research potential improvements in document validation for storage

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Query Execution

      MongoDB uses DeltaExecutor::applyUpdate to apply diff updates to the document. For example, secondaries in the replica set use it to apply oplog entries.

      One of the steps of DeltaExecutor::applyUpdate execution is validation of the result. It happens in the call to storage_validation::scanDocument.

      This validation consists of several parts:

      1. Validation of maximum document depth
      2. Ensuring that the document does not contain $-prefixed fields except some special cases
      3. Setting the flag if the result document contains fields with dots and dollars. While not exactly a "validation" step, it still performed here to avoid traversing the document second time.
      4. Ensuring the structural integrity of a DBRef field

      There are two flags controlling the validation stages:

      • skipDotsDollarsCheck. If this flag is set to false (note the negation in the flag's name), first 3 checks need to be performed.
      • validateForStorage. If this flag is set to true, all 4 checks need to be performed.

      Since all validation checks are performed in one method storage_validation::scanDocument, we call it if either of this flags is set.

      Validation can be time-expensive operation for large documents. We have noticed significant improvements by optimizing and skipping some of the validation (see SERVER-60176 and SERVER-60156). We have a theory that we can get additional performance improvements by experimenting in this direction.

      The main goal of this ticket is to research if there are any workloads where further optimization of validation can benefit performance.

      If such workloads exist, we can try to split the validation in 2 parts: first 3 checks and the last check. There are several reasons we could want to do that:

      1. This will allow to skip check (4) entirely when the validateForStorage flag is not set
      2. Assuming that the pre-image document is valid, we can validate points (1), (2) and (3) by analyzing only the diff, which can be significantly smaller than the result document itself. Once again, this could speed up the case when the validateForStorage is not set. Note that point (4) is contextually dependent and we cannot check it only with the diff, the pre-image document is also required.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            nikita.lapkov@mongodb.com Nikita Lapkov (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: