Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-74

Upserting with Python API

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 1.0.0
    • Component/s: API
    • Labels:
      None
    • Environment:
      Pyspark 1.6.2 on Databricks 2.

      Currently no upsert logic is available, as far as I know. Closest thing I could find was in SPARK-66 , but I don't know that the python API can directly access `MongoCollection` class, so I'm not sure the upserting can be done on the mongo end through python. If it can could you please advise?

      Our current workaround is to read in an entire collection (A), edit it to create collection (B), filter (A) and union with (B) to create our upserted collection (AB), and then overwrite the upserted collection. We are using the `DataFrameWriter` interface as is suggested in the documentation. This is less than ideal for performance/speed reasons, and won't work with very large collections.

      Can some form of support be added for "upsert" style logic with the python API, either similar to SPARK-66 or as a direct call?

            Assignee:
            Unassigned Unassigned
            Reporter:
            brencklebox Mark Brenckle
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: