Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Gone away
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- external-user

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Spark does not have a UUID datatype, and any UUIDs are stored as StringType in a dataset/dataframe. They also remain strings when written to MongoDB using the mongo-spark-connector, instead of ‘UUID( )’. Other database connectors with explicit UUID types, i.e. Postgres, will infer UUIDs on write, and store the datatype as UUID. This causes problems with other jobs that expect to read 'UUID( )' from Mongo. This ticket is to implement writing UUIDs to MongoDB via Spark such that they are loaded as ‘UUID( )’ and not string, when schema is inferred.

Ex. in mongo:
id: “825687f0-16f8-4912-b911-c46a072c499a”
vs.
id: UUID(“825687f0-16f8-4912-b911-c46a072c499a”)

is related to

SPARK-326 Support all bson types in the new connector

Closed

Assignee:: Ross Lawley
Reporter:: Luke Chu
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Mar 11 2022 04:57:51 PM UTC
Updated:: Apr 16 2024 04:18:51 AM UTC
Resolved:: Jun 03 2022 12:00:24 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates