Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-7234

Prefix-compressed keys and memory amplification

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT10.0.1, 4.4.7, 5.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      1. WiredTiger row-store prefix compression does adjacent key compression during disk-image block layout: the first key in the page is always stored in full, the second key in the page will be stored with any common prefix shared with the first key compressed out, (up to 256 bytes), the third key in the page will be stored with any common prefix shared with the second key compressed out, and so on. As a comparison, there are other systems that store a separate dictionary per page of common prefixes, and keys on the page reference that dictionary.

      2. When a WiredTiger page is read into memory and a binary search is performed, key comparisons will be done using the keys at the binary search points in the page and possibly prefix-compressed keys have to be read for those comparisons. The algorithm for creating a full key from a prefix-compressed key is to walk backward in the page until a fully instantiated key is found, and then walk forward back to the required key, creating each intermediate key and resolving its prefix compression, until we return to the required key.

      3. The work in #2 is expensive in the case of large sets of shared prefixes: imagine a row-store page with a long, shared prefix followed by a unique record number. It's easy to imagine a page entirely populated with a hundred-byte common URI prefix and a 4B unique record number suffix. The only fully written key on the page is the first key: when reading a key at the end of the page, we walk backward to the first key and then walk forward to the required key, instantiating every intermediate key on the page in order to create a single key for the purposes of doing one lookup of a binary search.

      4. Note this does not apply to cursor traversal, only to binary search, that is, point reads and updates. (In summary, forward cursor traversal doesn't do any additional work in the case of prefix-compressed keys, backward cursor traversal is further discussed in WT-7204.)

      5. To avoid the high-cost of #2 for repeated searches (imagine repeated point reads on a hot, cached page with many prefix-compressed keys), prefix-compressed keys are fully instantiated the first time they are created during binary search. This is done by allocating a chunk of memory that includes the full key. All future searches will find this copy of the full key instead of having to redoing the walk of the page to re-create the key.

      6. The work in #5 results in memory amplification of the page in memory, that is, a page with heavy key prefix compression can end up with an allocated memory structure that includes a copy of the key for every key on the page.

      7. An idea for avoiding this memory amplification in the case of a long prefix shared by a large number of keys on the page is to have a specific prefix-compressed key reference the fully written key that can be used to create the full key. In the example above, a page entirely populated by a long shared URI prefix followed by a unique 4B record number, all of the keys could be immediately instantiated using the fully written first key on the page and the key's suffix. In other words, instead of walking backward and forward through the page to create a key, we could immediately grab the prefix from the fully written key and append the suffix for the key and we'd be done.

      8. The work in #7 requires identifying keys on the page with a long shared prefix (as opposed to keys with some other prefix that would still require the forward/backward walk to create). This could be done as part of loading the page into memory: when a page is read and indexed, we could track the longest set of keys with a shared prefix and save that information with the page.

      9. If it's fast enough to copy the prefix and suffix for these keys, there's no reason we'd have to instantiate the created key in memory at all, we could copy the prefix and suffix every time the key is needed.

            Assignee:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Reporter:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: