Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-183

Upsert fails on duplicate key error with Unique indexes other than "_id"

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Writes
    • Labels:
      None

      We are trying to do "upsert" to documents in MongoDB which have a unique index (both single column and composite index). These indexes are separate from default "_id" index. The "replaceDocument" works great when we are dealing with only default "_id" unique index.

      What is the correct way to achieve upserts to documents with unique indexes other than "_id"? Is there a "mapping_id" concept where we can tell Mongo-Spark connector to perform upserts on them?

      The current error we get is a standard duplicate key error

      org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent failure: Lost task 0.0 in stage 16.0 (TID 209, localhost, executor driver): com.mongodb.MongoBulkWriteException: Bulk write operation error on server xx.xx.xx.xx:27017. Write errors: [BulkWriteError{index=2, code=11000, message='E11000 duplicate key error collection: ekg.datapull_test index: name_-1 dup key: { : "a" }', details={ }}]. at com.mongodb.connection.BulkWriteBatchCombiner.getError(BulkWriteBatchCombiner.java:176) at

       

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            srinu.gajjala321 srinivas rao gajjala
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: