-
Type:
New Feature
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Writes
-
None
-
Java Drivers
-
None
-
None
-
None
-
None
-
None
-
None
Product Use Case
As a user of the MongoDB Spark connector, I may want to ignore duplicate key errors, when inserting data.
As Spark tasks with partial execution can fail, the partition be processed by another executor. When this happens duplicate key exceptions occur.
This makes insert only operations fragile and the alternative is to rely on slower upserts.
User Impact
- Adding this feature would improve throughput for insert only workloads __
Acceptance Criteria
Implementation Requirements
- Will only work for unordered writes
- The data must include an _id / the fields declared in the idFieldList
- For insert only operations.
- Support batched and streaming writes