Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-251

Upsert fails with duplicate key error on _id

    • Type: Icon: Task Task
    • Resolution: Works as Designed
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: Configuration
    • Labels:
      None

      We are trying to use "upsert" functionality in mongoDB when storing a dataset with specific values in Id field. (If the _Id value from the dataset is found in MongoDB - update the record, if not found insert it). We use mongo-spark-connector_2.11 version 2.4.0. We set the following configuration on WriteConfig forceInsert=true, replaceDocument=true (also tried dataset.write().mode(SaveMode._Overwrite).

      When trying to insert the dataset in the collection that contains id records with values from the dataset we get duplicate error (E11000 duplicate key error index: embsgdb01.xxx.$_id dup key: { : ObjectId('5d08f304ffb7a770442736c8')). My understanding is that this issue was remediated in older versions. Please let us know if anything missing from configuration or versions used.

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            dsava Doina Sava
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: