-
Type: Improvement
-
Resolution: Gone away
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
Spark does not have a UUID datatype, and any UUIDs are stored as StringType in a dataset/dataframe. They also remain strings when written to MongoDB using the mongo-spark-connector, instead of ‘UUID( )’. Other database connectors with explicit UUID types, i.e. Postgres, will infer UUIDs on write, and store the datatype as UUID. This causes problems with other jobs that expect to read 'UUID( )' from Mongo. This ticket is to implement writing UUIDs to MongoDB via Spark such that they are loaded as ‘UUID( )’ and not string, when schema is inferred.
Ex. in mongo:
id: “825687f0-16f8-4912-b911-c46a072c499a”
vs.
id: UUID(“825687f0-16f8-4912-b911-c46a072c499a”)
- is related to
-
SPARK-326 Support all bson types in the new connector
- Closed