Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: 1.1.0
Affects Version/s: 1.0.0
Component/s: Schema
Labels:
- feature
Environment:
Pyspark sql 1.6.2 on databricks

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Some our data in mongo is of a "map" type. It's represented in mongo as an "Object" type, with a number of possible fields within it, of a defined type.

i.e (...{"arbitrary_key":

{sub_object}

...}

we can read in the variable-field schema using Sparks MapType(), which allows us to specify the type of the key and value without requiring hardcoding of the names or the number of fields in the map. This works fine with the Mongo-spark connector when specifying the schema for reading.

The issue comes when writing back out using the same schema. Python dictionaries can be passed into schemas as Map objects, to build dataframes that have these MapType() objects within them. Writing with the connector (using the same schema as for reading) produces the following error type:

Cannot cast Map (example) into a BsonValue. MapType (schema) has no matching BsonValue.

Is it possible to add support for writing from MapType objects into Mongo using the connector? It seems like they would need to be converted by the connector from dictionary-like objects into Bson objects in order to be written.

Assignee:: Ross Lawley
Reporter:: Mark Brenckle
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Aug 11 2016 05:53:29 PM UTC
Updated:: Sep 09 2016 01:38:02 PM UTC
Resolved:: Sep 01 2016 12:11:10 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates