Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Minor - P4
Fix Version/s: 2.1.3, 2.2.4, 2.3.0
Affects Version/s: 2.2.3
Component/s: Schema
Labels:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Hello,

When using this connector to write a data frame into a mongo collection, we noticed that when we grouped data over a key, and collected some ObjectId's into a single column as array, the resulting Bson document would be an array of object, instead of an array of object id.

For example:

df.groupBy(col('masters.oid')) \
.agg(
 collect_list(struct(lit('5af5b894b669df00048ff623').alias('oid'))).alias('pokemons')
)

Note: in our real code, the pokemons field is the aggregation of pokemon id's that come from the connector.

This code, instead of resulting in a BSON document with ObjectId's in an array(on pokemons column), would result in an array with objects that has the 'oid' as the key.

We looked into the source code to see that this maybe so in 2 cases:

When the StructField has a nullable: false option, which makes it fail the BsonCompatibility check.
When the object that has the oid field, is in a map or array. The array/map objects get mapped with the `rowToDocument` which doesn't check if the object itself is compatible with Bson types or not.

With some small changes, we made the code behave like we wanted.

Is there anyway else we could have the same effect, without modifying the connector itself? I mean is there a way to define a data frame that has ObjectId's inside an array, that writes to mongo correctly?

Assignee:: Ross Lawley
Reporter:: Yi?itcan UÇUM [X]
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Jul 19 2018 06:34:48 PM UTC
Updated:: Oct 28 2023 10:34:07 AM UTC
Resolved:: Jul 24 2018 01:30:27 PM UTC

Details

Description

Attachments

Activity

People

Dates