[JAVA-1716] java.lang.IllegalStateException: open Created: 23/Mar/15  Updated: 11/Sep/19  Resolved: 14/Apr/15

Status: Closed
Project: Java Driver
Component/s: Connection Management, Error Handling
Affects Version/s: 3.0.1
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Luis Rodríguez Assignee: Unassigned
Resolution: Done Votes: 0
Labels: apache-Spark, java
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu Linux 14.04, JDK 7


Issue Links:
Related

 Description   

I am trying to integrate MongoDB with Apache Spark to process data.

When trying to execute my program with this command (../spark-1.3.0-bin-hadoop2.4/bin/spark-submit --master spark://luis-VirtualBox:7077 --jars $(echo /home/luis/mongo-spark/lib/*.jar | tr ' ' ',') --class JavaWordCount target/scala-2.10/mongo-spark_2.10-1.0.jar mydb.testCollection mydb.outputTest7) I get the following exception:

15/03/23 17:05:34 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 4, 10.0.2.15): java.lang.IllegalStateException: open
at org.bson.util.Assertions.isTrue(Assertions.java:36)
at com.mongodb.DBTCPConnector.getPrimaryPort(DBTCPConnector.java:406)
at com.mongodb.DBCollectionImpl.insert(DBCollectionImpl.java:184)
at com.mongodb.DBCollectionImpl.insert(DBCollectionImpl.java:167)
at com.mongodb.DBCollection.insert(DBCollection.java:161)
at com.mongodb.DBCollection.insert(DBCollection.java:107)
at com.mongodb.DBCollection.save(DBCollection.java:1049)
at com.mongodb.DBCollection.save(DBCollection.java:1014)
at com.mongodb.hadoop.output.MongoRecordWriter.write(MongoRecordWriter.java:105)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I have read in some places that it is caused by a close connection, but I don't close it in any part of the code.

Thank you in advance.



 Comments   
Comment by zqwang [ 25/Jul/15 ]

Thank you!

Comment by Ross Lawley [ 24/Jul/15 ]

This ticket is closed so no further action will be taken here. I see you have also opened HADOOP-217 and the followup will be on that ticket.

Comment by zqwang [ 24/Jul/15 ]

Hi, I use mongo-hadoop-core-1.3.2/1.3.1/1.4.0 and when I run app in Spark on standalone modal, it gets the same issue, but it runs well if I use local modal. I have been stuck in this issue several days. Please help me .

Comment by Jeffrey Yemin [ 14/Apr/15 ]

Thanks for investigating. I'll close this as Works as Designed then.

Comment by Eyal Zituny [X] [ 14/Apr/15 ]

glad i could help. i think that the problem relates to the mongo hadoop integration and not to the mongo java driver

Comment by Luis Rodríguez [ 14/Apr/15 ]

Eyal Zituni, that was the problem! I downgraded to version 1.3.1 of the MongoDB-Hadoop Driver and now it works like a charm. Thank you very match! Should I close the issue?

Comment by Eyal Zituny [X] [ 13/Apr/15 ]

I had the same issue. it seems like a problem which is caused by a new feature that has been added to version 1.32 of the mongo-hadoop-core driver.
a new mongo client pool has been added to the MongoConfigUtil class,
see - https://github.com/mongodb/mongo-hadoop/commit/f8f98b1bef05579fce8ef46742e75cdb4d294d2f
i suspect that a problem with this pool might raised once 2 or more concurrent threads are required the same mongo client (identify by the uri), they will both acquire the same client instance and if one of them will finish before the 2nd has finished and the close method will be called by the 1st... the 2nd thread will loose the connection.
see the saveAsNewAPIHadoopDataset method in PairRDDFunctions.scala (963) which first obtain the mongo client (writer) and afterward calls the close.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala

*i believe that this will probably only occurs in frameworks such as spark which will run the driver in a multithreads environment (the MongoConfigUtil and the pool are statics)

thanks

Eyal

Comment by Jeffrey Yemin [ 23/Mar/15 ]

Yes, every time we've seen this it's because a MongoClient instance was closed and then subsequently used.

Generated at Thu Feb 08 08:55:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.