Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 1.0.0
Component/s: API
Labels:
None
Environment:
Pyspark 1.6.2 on Databricks 2.

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Currently no upsert logic is available, as far as I know. Closest thing I could find was in SPARK-66 , but I don't know that the python API can directly access `MongoCollection` class, so I'm not sure the upserting can be done on the mongo end through python. If it can could you please advise?

Our current workaround is to read in an entire collection (A), edit it to create collection (B), filter (A) and union with (B) to create our upserted collection (AB), and then overwrite the upserted collection. We are using the `DataFrameWriter` interface as is suggested in the documentation. This is less than ideal for performance/speed reasons, and won't work with very large collections.

Can some form of support be added for "upsert" style logic with the python API, either similar to ~~SPARK-66~~ or as a direct call?

backports

SPARK-415 writing dataframe in append mode using UnityCatalog cluster is not supported on Databricks

Backlog

Assignee:: Unassigned
Reporter:: Mark Brenckle
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Aug 23 2016 05:14:48 PM UTC
Updated:: Dec 06 2023 07:24:59 PM UTC
Resolved:: Aug 24 2016 07:45:14 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates