Add OpenLineage support

XMLWordPrintableJSON

    • Type: New Feature
    • Resolution: Unresolved
    • Priority: Unknown
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Java Drivers

      I'd like to add OpenLineage support to the MongoDB Spark Connector. The connector uses MongoTable, which resolves to a RelationV2 logical plan for both read and write operations. When writing, it supports both AppendData and OverwriteByExpression modes. In both cases, the logical plan includes RelationV2 and looks roughly as follows (in case of reading from one and writing to another collection- so we have lineage relation):

      AppendData RelationV2[...] MongoTable(), ...
       └─ Project [...]
           └─ RelationV2[...] MongoTable()
      
      OverwriteByExpression RelationV2[...] MongoTable(), ...
       └─ Project [...]
           └─ RelationV2[...] MongoTable()
      

      openlineage-spark already supports AppendData, OverwriteByExpression, and RelationV2. However, in order to emit dataset information, it relies on the Table.getProperties() method to retrieve openlineage.dataset.namespace and openlineage.dataset.name.

      The challenge is that these properties need to be derived from the underlying MongoDB configuration—specifically, the connection URI, database name, and collection name. Unfortunately, this information is stored in MongoConfig, which is an immutable object and needed lineage information are not easily accessible from the MongoTable context. Moreover, it's unclear how to distinguish between read and write paths in this setup to extract the correct collection name dynamically.

      Any guidance on how to properly access this configuration or restructure the code to expose the needed metadata would be appreciated

              Assignee:
              Unassigned
              Reporter:
              Dominik Dębowczyk
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: