Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55344

Propagate whether we are inserting multikeyMetadataKeys down to the SortedDataInterface layer

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Index Maintenance, Storage
    • Labels:
      None
    • Storage Execution

      multikeyMetadataKeys are the only case right now where at steady-state, we expect different rows to insert conflicting keys into a WiredTigerIndexStandard. This works* because we ignore WT_DUPLICATE_KEY. But it also prevents us from doing blind updates with "overwrite=true", because that converts the WT_DUPLICATE_KEY error into a WT_ROLLBACK/WriteConflictException if two concurrent writes try to write the same metadata key, which is disastrous for perf. We have strong indications that LSM will be needed for good perf with wildcard indexes (which are also the only kinds that generate multikeyMetadataKeys), and LSM trees need "overwrite=true" for good perf.

      There are a few options here, including:

      1. do a search/search_near prior to the insert for these keys and in the highly likely case that they are found, do not try to insert
      2. do the insert for these keys with "overwrite=false" even when we would normally insert with "overwrite=true". Because each index will (generally) only have up to a few hundred of these keys regardless of their size, and because they are all clustered together in the tree, the downsides of "overwrite=false" with LSM don't apply as much.
      3. store these keys in a separate side btree tree rather than using LSM. You could probably even have one table per db, although that might complicate some operations.
      4. cache the set of multikeyMetadataKeys for each index, and only send the new ones down to the SDI. This requires some careful caching logic to be correct WRT transactions, snapshots, and rollback, but since the set of keys is likely to very rarely change, it may be worth it.

      * Well it currently has some race conditions with mongo's implicit WriteUnitOfWork transactions due to WT-7310. Also, that comment should probably be updated to mention this case since it is misleading as written now.

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            mathias@mongodb.com Mathias Stearn
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: