Need a way to specify schema in PySpark (Mongo Connector)

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Works as Designed
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Schema
    • Environment:
      Python, Spark 2.0, Pyspark
    • None
    • None
    • None
    • None
    • None
    • None

      Would like to specify a schema for my collection. It has updated schema over time, adding fields, never removing any. Need to specify schema (of latest document) so I don't need to read the full collection to fit to a schema.

      I know this can be done in Scala, but there is no documentation for it in Python.

              Assignee:
              Ross Lawley
              Reporter:
              Jeremy
              None
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: