Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None
Environment:
Spark Connector

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Customer is experiencing a duplicate key exception when attempting to execute MongoSpark.save(RDD, writeConfig)

def save[D: ClassTag](rdd: RDD[D], writeConfig: WriteConfig): Unit)and encountering documents which already exist in the target collection (same _id)

Looking at MongoSpark.scala, it appears that there is a code path for

def save[D](dataset: Dataset[D], writeConfig: WriteConfig): Unit

that checks for the option replaceDocument This check isn't contained in the RDD code path.

Can this be added? Is there a specific reason this is disallowed? Are there other workarounds for this?

is duplicated by

SPARK-280 Enhance save(RDD) to avoid duplicate key exception

Closed

Assignee:: Ross Lawley

Reporter:: Steffan Mejia

Reviewers:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: Oct 15 2020 10:25:20 PM UTC

Updated:: Sep 22 2021 06:54:27 PM UTC

Resolved:: Nov 02 2020 02:31:54 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates