Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-251

Upsert fails with duplicate key error on _id

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Minor - P4
    • Resolution: Works as Designed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Configuration
    • Labels:
      None

      Description

      We are trying to use "upsert" functionality in mongoDB when storing a dataset with specific values in Id field. (If the _Id value from the dataset is found in MongoDB - update the record, if not found insert it). We use mongo-spark-connector_2.11 version 2.4.0. We set the following configuration on WriteConfig forceInsert=true, replaceDocument=true (also tried dataset.write().mode(SaveMode._Overwrite).

      When trying to insert the dataset in the collection that contains id records with values from the dataset we get duplicate error (E11000 duplicate key error index: embsgdb01.xxx.$_id dup key: { : ObjectId('5d08f304ffb7a770442736c8')). My understanding is that this issue was remediated in older versions. Please let us know if anything missing from configuration or versions used.

        Attachments

          Activity

            People

            Assignee:
            ross.lawley Ross Lawley
            Reporter:
            dsava Doina Sava
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: