Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-325

UUIDs not being inferred during write from Spark

    • Type: Icon: Improvement Improvement
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      Spark does not have a UUID datatype, and any UUIDs are stored as StringType in a dataset/dataframe. They also remain strings when written to MongoDB using the mongo-spark-connector, instead of ‘UUID( )’. Other database connectors with explicit UUID types, i.e. Postgres, will infer UUIDs on write, and store the datatype as UUID. This causes problems with other jobs that expect to read 'UUID( )' from Mongo. This ticket is to implement writing UUIDs to MongoDB via Spark such that they are loaded as ‘UUID( )’ and not string, when schema is inferred.

      Ex. in mongo:
      id: “825687f0-16f8-4912-b911-c46a072c499a”
      vs.
      id: UUID(“825687f0-16f8-4912-b911-c46a072c499a”)
       
       

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            lukechu1018@gmail.com Luke Chu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: