-
Type: Task
-
Resolution: Done
-
Priority: Critical - P2
-
None
-
Affects Version/s: None
-
Component/s: Configuration, Logging, Writes
-
None
Question 1)
Hadoop team is figuring out way to handle the duplicate error exception when job is running with “writeConcern:true”. Find the below configurations of 2 scenarios and let us know how to handle the error exception and continue the job from Mongo Spark connector.
Scenarios with writeConcern
1) With writeConcern: 1
Configuration:
mongo
{ "connection.uri" = "[hostname.express-scripts.com|http://ps2pr615575.express-scripts.com/]:port/DataBase?ssl=true" "username" = "UserName""output.collection" = "CLAIMS_ENTITY_COMM" "spark.mongodb.output.writeConcern.w" = "1" "spark.mongodb.output.writeConcern.journal" = "true" "spark.mongodb.output.ordered" = "false"}
Result: Job stops
Sample Error message:
com.mongodb.MongoBulkWriteException: Bulk write operation error on server ch3dr615402.express-scripts.com:27017. Write errors: [BulkWriteError\{index=0, code=11000, message='E11000 duplicate key error collection: ClaimsEntity.CLAIMS_ENTITY_COMM_STAGING index: patientInfo.patientAgn_1_dateOfService_1_claimId_1 dup key: { : 44296071, : new Date(1468987200000), : 990851179746650060 }', details=\{ }}].
at com.mongodb.connection.BulkWriteBatchCombiner.getError(BulkWriteBatchCombiner.java:176)
at com.mongodb.connection.BulkWriteBatchCombiner.throwOnError(BulkWriteBatchCombiner.java:205)
at com.mongodb.connection.BulkWriteBatchCombiner.getResult(BulkWriteBatchCombiner.java:146)
at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:188)
at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:168)
at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:422)
at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:413)
at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:168)
at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:74)
at com.mongodb.Mongo.execute(Mongo.java:845)
at com.mongodb.Mongo$2.execute(Mongo.java:828)
at com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:338)
at com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:322)
at com.mongodb.spark.MongoSpark$$anonfun$save$1$$anonfun$apply$1$$anonfun$apply$2.apply(MongoSpark.scala:119)
2) With writeConcern: 0
**
Configuration:
mongo
{ "connection.uri" = "[hostname.express-scripts.com|http://ps2pr615575.express-scripts.com/]:port/DataBase?ssl=true" "username" = "UserName" "output.collection" = "CLAIMS_ENTITY_COMM" "spark.mongodb.output.writeConcern.w" = "0" "spark.mongodb.output.writeConcern.journal" = "false" "spark.mongodb.output.ordered" = "false"}
Result: Job finished with Discrepency