Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-279

Duplicate key exception when using Spark Connector save with RDD

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      Spark Connector

      Description

      Customer is experiencing a duplicate key exception when attempting to execute MongoSpark.save(RDD, writeConfig)

      def save[D: ClassTag](rdd: RDD[D], writeConfig: WriteConfig): Unit)and encountering documents which already exist in the target collection (same _id)

      Looking at MongoSpark.scala, it appears that there is a code path for

      def save[D](dataset: Dataset[D], writeConfig: WriteConfig): Unit

      that checks for the option replaceDocument This check isn't contained in the RDD code path.

      Can this be added? Is there a specific reason this is disallowed? Are there other workarounds for this?

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              ross.lawley Ross Lawley
              Reporter:
              steffan.mejia Steffan Mejia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: