Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-197

Spark structs not getting mapped to bson correctly

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Minor - P4 Minor - P4
    • 2.1.3, 2.2.4, 2.3.0
    • Affects Version/s: 2.2.3
    • Component/s: Schema
    • Labels:
      None

      Hello,

      When using this connector to write a data frame into a mongo collection, we noticed that when we grouped data over a key, and collected some ObjectId's into a single column as array, the resulting Bson document would be an array of object, instead of an array of object id.

      For example:

      df.groupBy(col('masters.oid')) \
      .agg(
       collect_list(struct(lit('5af5b894b669df00048ff623').alias('oid'))).alias('pokemons')
      )
      

      Note: in our real code, the pokemons field is the aggregation of pokemon id's that come from the connector.

      This code, instead of resulting in a BSON document with ObjectId's in an array(on pokemons column), would result in an array with objects that has the 'oid' as the key.

      We looked into the source code to see that this maybe so in 2 cases:

      • When the StructField has a nullable: false option, which makes it fail the BsonCompatibility check.
      • When the object that has the oid field, is in a map or array. The array/map objects get mapped with the `rowToDocument` which doesn't check if the object itself is compatible with Bson types or not.

      With some small changes, we made the code behave like we wanted.

      Is there anyway else we could have the same effect, without modifying the connector itself? I mean is there a way to define a data frame that has ObjectId's inside an array, that writes to mongo correctly?

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            Yengas Yi?itcan UÇUM [X]
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: