Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-83671

Implement lookup tries for IDL parse in C++

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Service Arch
    • Service Arch 2024-01-22, Service Arch 2024-02-05, Service Arch 2024-02-19, Service Arch 2024-03-04, Service Arch 2024-03-18, Service Arch 2024-04-01

      Currently IDL uses tries, rendered into C++ as custom find functions generated for each hardcoded trie instance table, encoding the table's contents into the code as literals.

      We should be able to do something much more constexpr(?) C++ container-like.

      maybe a third_party solution.

      Sample API for a minimal wrapper to be called from Python code:

      class SomeIdlGeneratedStruct {
          ...
          enum class FieldId {
              aField,
              someOtherField,
              yetAnotherField,
          };
      
          static constexpr inline FieldNameMap<FieldId> _fieldNames{
              {"aField", FieldIds::aField},
              {"someOtherField", FieldIds::someOtherField},
              {"yetAnotherField", FieldIds::yetAnotherField},
          };
          ...
          /// pseudocode for the parser...
          Status parseProtected(BSONObj input) {
              for (auto&& elem : input) {
                  auto iter = _fieldNames.find(elem.getFieldNameStringData());
                  if (iter == _fieldNames.end()) continue;
                  auto&& [name, id] = *iter;
                  switch (id) {
                      case FieldId::aField: ...
                      case FieldId::someOtherField: ...
                      case FieldId::yetAnotherField: ...
                  }
              }
          }
      };
      

      From that point on, _fieldNames is accessed by Python to accept StringData name and return the FieldId of the field having that StringData as a name.

      We can let the definition of exactly what FieldNameMap<T> is be an implementation detail in the idl_parser.h header. It's probably a thin wrapper around a FieldNameMap<size_t>, which is a thin wrapper around some kind of (TBD) PerfectlyHashedMap<StringData, size_t> or Trie<size_t>. It's pretty much a minimal associative unordered container's API, specifically made for this kind of search pattern (and const initialization) as an internal implementation choice.

      We already have FieldIds defined inside the parse functions (see attached trie.txt file) but they're not in an enum yet. We could just do that change, assigning each struct field an enumeration in the closed enum set.

            Assignee:
            Unassigned Unassigned
            Reporter:
            billy.donahue@mongodb.com Billy Donahue
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: