Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-261

Handle error Exception from Spark Connector

    • Type: Icon: Task Task
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: Configuration, Logging, Writes
    • Labels:
      None

      Question 1)

      Hadoop team is figuring out way to handle the duplicate error exception when job is running with “writeConcern:true”. Find the below configurations of 2 scenarios and let us know how to handle the error exception and continue the job from Mongo Spark connector.

       

      Scenarios with writeConcern

       

      1)      With writeConcern: 1

       

      Configuration:

      mongo

      {  "connection.uri" = "[hostname.express-scripts.com|http://ps2pr615575.express-scripts.com/]:port/DataBase?ssl=true"  "username" = "UserName""output.collection" = "CLAIMS_ENTITY_COMM"  "spark.mongodb.output.writeConcern.w" = "1"  "spark.mongodb.output.writeConcern.journal" = "true"  "spark.mongodb.output.ordered" = "false"}

       

      Result: Job stops

       

      Sample Error message:

      com.mongodb.MongoBulkWriteException: Bulk write operation error on server ch3dr615402.express-scripts.com:27017. Write errors: [BulkWriteError\{index=0, code=11000, message='E11000 duplicate key error collection: ClaimsEntity.CLAIMS_ENTITY_COMM_STAGING index: patientInfo.patientAgn_1_dateOfService_1_claimId_1 dup key: { : 44296071, : new Date(1468987200000), : 990851179746650060 }', details=\{ }}].

              at com.mongodb.connection.BulkWriteBatchCombiner.getError(BulkWriteBatchCombiner.java:176)

              at com.mongodb.connection.BulkWriteBatchCombiner.throwOnError(BulkWriteBatchCombiner.java:205)

              at com.mongodb.connection.BulkWriteBatchCombiner.getResult(BulkWriteBatchCombiner.java:146)

              at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:188)

              at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:168)

              at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:422)

              at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:413)

              at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:168)

              at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:74)

              at com.mongodb.Mongo.execute(Mongo.java:845)

              at com.mongodb.Mongo$2.execute(Mongo.java:828)

              at com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:338)

              at com.mongodb.MongoCollectionImpl.insertMany(MongoCollectionImpl.java:322)

              at com.mongodb.spark.MongoSpark$$anonfun$save$1$$anonfun$apply$1$$anonfun$apply$2.apply(MongoSpark.scala:119)

       

       

      2)      With writeConcern: 0

       ** 

      Configuration:

      mongo

      {  "connection.uri" = "[hostname.express-scripts.com|http://ps2pr615575.express-scripts.com/]:port/DataBase?ssl=true"  "username" = "UserName"  "output.collection" = "CLAIMS_ENTITY_COMM"  "spark.mongodb.output.writeConcern.w" = "0"  "spark.mongodb.output.writeConcern.journal" = "false"  "spark.mongodb.output.ordered" = "false"}

       

      Result: Job finished with Discrepency

                 

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            dgunda@express-scripts.com Dheeraj Gunda
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: