Add configuration to ignore duplicate errors when sinking data

XMLWordPrintableJSON

    • Type: New Feature
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Writes
    • None
    • Java Drivers
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?
    • None
    • None
    • None
    • None
    • None
    • None

      Product Use Case

      As a user of the MongoDB Spark connector, I may want to ignore duplicate key errors, when inserting data.

      As Spark tasks with partial execution can fail, the partition be processed by another executor. When this happens duplicate key exceptions occur.

      This makes insert only operations fragile and the alternative is to rely on slower upserts.

      User Impact

      • Adding this feature would improve throughput for insert only workloads __ 

      Acceptance Criteria

      Implementation Requirements

      • Will only work for unordered writes
      • The data must include an _id / the fields declared in the idFieldList
      • For insert only operations.
      • Support batched and streaming writes

       

            Assignee:
            Unassigned
            Reporter:
            Ross Lawley
            None
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: