Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-99419

Standardize aggregation stage aliases as namespaces instead of non-instantiable classes

    • Query Optimization

      An aggregation stage "alias" (or "sugar") is when an aggregation stage is defined in terms of some other stage(s). For example, { $count: "foo" } is not implemented directly, but rather as { $group: { _id: null, foo: { $sum: 1 } } }, { $project: { _id: 0, foo: 1 } }. That is, $count is an alias because its parser doesn't return a "count" DocumentSource stage object, but instead returns a DocumentSourceGroup object followed by a DocumentSourceProject object.

      This is possible because REGISTER_DOCUMENT_SOURCE (and friends) only requires the name of the parsing function, which returns a list of pointers to DocumentSource objects. (Of course it actually also requires some other info, that isn't relevant here.)

      In some cases, the parser is a static function inside a class, even though that class is never instantiated. For example, DocumentSourceCount does not inherit from DocumentSource (which is initially confusing), and declares the default ctor as private, preventing it from being instantiated. Sometimes, the private default ctor is missing, even though the class contains only static members (eg. DocumentSourceIndexStats).

      In other cases, the parser is just a function inside a namespace. For example, DocumentSourceDocuments::createFromBson.

      In some of these cases, the namespace follows the camel-case class naming convention (eg. DocumentSourceDocuments, as above), while in others it follows the lower-case-with-underscores naming convention (eg. document_source_fill).

      This ticket is to clean up and standardize these competing approaches of achieving the same thing.

      I'm not aware of any advantages of the non-instantiable class pattern (and haven't seen any comments or documentation supporting/explaining it), and to me there seem to only be disadvantages.

      I suggest that these alias stages should always use a namespace (not a class), and should use the normal underscore namespace naming convention, so that when they are referred to elsewhere, they are properly understood by the programmer as being a namespace, not a class.

      Although we do have CamelCase namespaces in various places, they tend to not be adjacent to a base class name, ie. DocumentSource exists as is a class that DocumentSourceFoo inherits from. This means that CamelCased namespaces are more likely to cause confusion, because when a programmer sees DocumentSourceFoo::createFromBson(), they will reasonably expect it to return a DocumentSourceFoo object. However, when a programmer sees document_source_foo::createFromBson(), it is easier to understand that this is an alias which will return objects of some other DocumentSource class(es). And so for example, it cannot be used in ways that regular DocumentSource objects can (eg. as a candidate for SBE pushdown).

      This will mean converting the class-based aliases to be namespaces (of which there are many), and converting the CamelCase namespaces to underscore namespaces (just a few). Neither should involve any functional changes. And since there are no objects of the renamed classes/namespaces, the scope of name changes within the source code (ie. how widely the changed names are referenced) should be reasonably limited. The most significant changes are likely to be for DocumentSourceChangeStream (which I believe should still be tractable).

      The potentially affected stages are below. I think this list is exhaustive, but I'm not 100% sure.

      underscore namespace stages: (no changes necessary)

      • document_source_densify
      • document_source_fill
      • document_source_set_window_fields

      CamelCase namespace stages: (rename to underscore namespace names)

      • DocumentSourceDocuments
      • DocumentSourceListClusterCatalog (but note SERVER-98658)
      • DocumentSourceShardedDataDistribution

      Instantiable class stages: (convert to namespace)

      • DocumentSourceIndexStats
      • DocumentSourceListCachedAndActiveUsers
      • DocumentSourceQuerySettings
      • DocumentSourceSetMetadata

      Non-instantiable class stages: (convert to namespace)

      • DocumentSourceAddFields
      • DocumentSourceBucket
      • DocumentSourceChangeStream
      • DocumentSourceCount
      • DocumentSourceProject
      • DocumentSourceRankFusion
      • DocumentSourceReplaceRoot
      • DocumentSourceScore
      • DocumentSourceScoreFusion
      • DocumentSourceSortByCount

            Assignee:
            Unassigned Unassigned
            Reporter:
            kevin.pulo@mongodb.com Kevin Pulo
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: