-
Type: Improvement
-
Resolution: Done
-
Priority: Critical - P2
-
None
-
Affects Version/s: 1.0.0
-
Component/s: API
-
None
-
Environment:Pyspark 1.6.2 on Databricks 2.
Currently no upsert logic is available, as far as I know. Closest thing I could find was in SPARK-66 , but I don't know that the python API can directly access `MongoCollection` class, so I'm not sure the upserting can be done on the mongo end through python. If it can could you please advise?
Our current workaround is to read in an entire collection (A), edit it to create collection (B), filter (A) and union with (B) to create our upserted collection (AB), and then overwrite the upserted collection. We are using the `DataFrameWriter` interface as is suggested in the documentation. This is less than ideal for performance/speed reasons, and won't work with very large collections.
Can some form of support be added for "upsert" style logic with the python API, either similar to SPARK-66 or as a direct call?
- backports
-
SPARK-415 writing dataframe in append mode using UnityCatalog cluster is not supported on Databricks
- Backlog