-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
Working in Python, here is the code that is being called:
Unable to find source-code formatter for language: python. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
# set up environment conf = SparkConf() \ .setAppName("MovieRatings") \ .set("spark.executor.memory", "4g") sc = SparkContext(conf=conf) sqlContext = SQLContext(sc) df = sqlContext.read.format("com.mongodb.spark.sql").load() print("Schema:") df.printSchema() print("Count:") df.count()
count never returns, instead a stack trace is generated:
16/04/29 13:14:12 INFO SparkContext: Created broadcast 3 from broadcast at MongoRDD.scala:145 Schema: root |-- _id: string (nullable = true) |-- genre: string (nullable = true) |-- rating: string (nullable = true) |-- title: string (nullable = true) |-- user_id: string (nullable = true) Count: 16/04/29 13:14:14 INFO SparkContext: Starting job: count at NativeMethodAccessorImpl.java:-2 16/04/29 13:14:14 INFO DAGScheduler: Registering RDD 12 (count at NativeMethodAccessorImpl.java:-2) 16/04/29 13:14:14 INFO DAGScheduler: Got job 1 (count at NativeMethodAccessorImpl.java:-2) with 1 output partitions 16/04/29 13:14:14 INFO DAGScheduler: Final stage: ResultStage 3 (count at NativeMethodAccessorImpl.java:-2) 16/04/29 13:14:14 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 2) 16/04/29 13:14:14 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 2) 16/04/29 13:14:14 INFO DAGScheduler: Submitting ShuffleMapStage 2 (MapPartitionsRDD[12] at count at NativeMethodAccessorImpl.java:-2), which has no missing parents 16/04/29 13:14:14 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 11.4 KB, free 26.0 KB) 16/04/29 13:14:14 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 5.8 KB, free 31.7 KB) 16/04/29 13:14:14 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:52084 (size: 5.8 KB, free: 511.1 MB) 16/04/29 13:14:14 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1006 16/04/29 13:14:14 INFO DAGScheduler: Submitting 6 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[12] at count at NativeMethodAccessorImpl.java:-2) 16/04/29 13:14:14 INFO TaskSchedulerImpl: Adding task set 2.0 with 6 tasks 16/04/29 13:14:14 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 8, localhost, partition 0,ANY, 2640 bytes) 16/04/29 13:14:14 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 9, localhost, partition 1,ANY, 2652 bytes) 16/04/29 13:14:14 INFO TaskSetManager: Starting task 2.0 in stage 2.0 (TID 10, localhost, partition 2,ANY, 2652 bytes) 16/04/29 13:14:14 INFO TaskSetManager: Starting task 3.0 in stage 2.0 (TID 11, localhost, partition 3,ANY, 2652 bytes) 16/04/29 13:14:14 INFO Executor: Running task 0.0 in stage 2.0 (TID 8) 16/04/29 13:14:14 INFO Executor: Running task 1.0 in stage 2.0 (TID 9) 16/04/29 13:14:14 INFO Executor: Running task 2.0 in stage 2.0 (TID 10) 16/04/29 13:14:14 INFO Executor: Running task 3.0 in stage 2.0 (TID 11) 16/04/29 13:14:15 INFO GenerateMutableProjection: Code generated in 403.549317 ms 16/04/29 13:14:15 INFO GenerateUnsafeProjection: Code generated in 39.364259 ms 16/04/29 13:14:15 INFO GenerateMutableProjection: Code generated in 17.201067 ms 16/04/29 13:14:15 INFO GenerateUnsafeRowJoiner: Code generated in 10.042296 ms 16/04/29 13:14:15 INFO GenerateUnsafeProjection: Code generated in 14.153414 ms 16/04/29 13:14:16 INFO BlockManagerInfo: Removed broadcast_2_piece0 on localhost:52084 in memory (size: 1788.0 B, free: 511.1 MB) 16/04/29 13:14:19 INFO MongoClientCache: Closing MongoClient: [127.0.0.1:27017] 16/04/29 13:14:19 INFO connection: Closed connection [connectionId{localValue:3, serverValue:7591}] to 127.0.0.1:27017 because the pool has been closed. 16/04/29 13:14:19 ERROR TaskContextImpl: Error in TaskCompletionListener java.lang.IllegalStateException: state should be: open at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70) at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:70) at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86) at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:263) at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:148) at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41) at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:258) at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:256) at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:19 WARN TaskMemoryManager: leak 4.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@589f3649 16/04/29 13:14:19 ERROR Executor: Managed memory leak detected; size = 4456448 bytes, TID = 10 16/04/29 13:14:19 INFO connection: Closed connection [connectionId{localValue:4, serverValue:7592}] to 127.0.0.1:27017 because the pool has been closed. 16/04/29 13:14:19 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 10) org.apache.spark.util.TaskCompletionListenerException: state should be: open at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:19 ERROR TaskContextImpl: Error in TaskCompletionListener java.lang.IllegalStateException: state should be: open at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70) at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:70) at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86) at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:263) at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:148) at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41) at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:258) at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:256) at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:19 WARN TaskMemoryManager: leak 4.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@da9b296 16/04/29 13:14:19 ERROR Executor: Managed memory leak detected; size = 4456448 bytes, TID = 8 16/04/29 13:14:19 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 8) org.apache.spark.util.TaskCompletionListenerException: state should be: open at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:19 INFO TaskSetManager: Starting task 4.0 in stage 2.0 (TID 12, localhost, partition 4,ANY, 2652 bytes) 16/04/29 13:14:19 INFO TaskSetManager: Starting task 5.0 in stage 2.0 (TID 13, localhost, partition 5,ANY, 2640 bytes) 16/04/29 13:14:19 INFO Executor: Running task 4.0 in stage 2.0 (TID 12) 16/04/29 13:14:19 INFO Executor: Running task 5.0 in stage 2.0 (TID 13) 16/04/29 13:14:19 WARN TaskSetManager: Lost task 2.0 in stage 2.0 (TID 10, localhost): org.apache.spark.util.TaskCompletionListenerException: state should be: open at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:20 ERROR TaskSetManager: Task 2 in stage 2.0 failed 1 times; aborting job 16/04/29 13:14:20 INFO TaskSetManager: Lost task 0.0 in stage 2.0 (TID 8) on executor localhost: org.apache.spark.util.TaskCompletionListenerException (state should be: open) [duplicate 1] 16/04/29 13:14:20 INFO cluster: Cluster created with settings {hosts=[127.0.0.1:27017], mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500} 16/04/29 13:14:20 INFO cluster: Adding discovered server 127.0.0.1:27017 to client view of cluster 16/04/29 13:14:20 INFO cluster: Cluster created with settings {hosts=[127.0.0.1:27017], mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500} 16/04/29 13:14:20 INFO cluster: Cluster description not yet available. Waiting for 30000 ms before timing out 16/04/29 13:14:20 INFO cluster: Adding discovered server 127.0.0.1:27017 to client view of cluster 16/04/29 13:14:20 INFO connection: Opened connection [connectionId{localValue:6, serverValue:7596}] to 127.0.0.1:27017 16/04/29 13:14:20 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=127.0.0.1:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 1]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=345918} 16/04/29 13:14:20 INFO cluster: Discovered cluster type of STANDALONE 16/04/29 13:14:20 INFO connection: Opened connection [connectionId{localValue:7, serverValue:7597}] to 127.0.0.1:27017 16/04/29 13:14:20 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=127.0.0.1:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 1]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=348599} 16/04/29 13:14:20 INFO cluster: Discovered cluster type of STANDALONE 16/04/29 13:14:20 INFO MongoClientCache: Creating MongoClient: [127.0.0.1:27017] 16/04/29 13:14:20 INFO MongoClientCache: Creating MongoClient: [127.0.0.1:27017] 16/04/29 13:14:20 INFO TaskSchedulerImpl: Cancelling stage 2 16/04/29 13:14:20 INFO MongoClientCache: Closing MongoClient: [127.0.0.1:27017] 16/04/29 13:14:20 INFO connection: Opened connection [connectionId{localValue:8, serverValue:7598}] to 127.0.0.1:27017 16/04/29 13:14:20 INFO connection: Opened connection [connectionId{localValue:9, serverValue:7599}] to 127.0.0.1:27017 16/04/29 13:14:20 INFO Executor: Executor is trying to kill task 4.0 in stage 2.0 (TID 12) 16/04/29 13:14:20 INFO TaskSchedulerImpl: Stage 2 was cancelled 16/04/29 13:14:20 INFO Executor: Executor is trying to kill task 1.0 in stage 2.0 (TID 9) 16/04/29 13:14:20 INFO Executor: Executor is trying to kill task 5.0 in stage 2.0 (TID 13) 16/04/29 13:14:20 INFO Executor: Executor is trying to kill task 3.0 in stage 2.0 (TID 11) 16/04/29 13:14:20 INFO DAGScheduler: ShuffleMapStage 2 (count at NativeMethodAccessorImpl.java:-2) failed in 5.548 s 16/04/29 13:14:20 INFO connection: Closed connection [connectionId{localValue:2, serverValue:7590}] to 127.0.0.1:27017 because the pool has been closed. 16/04/29 13:14:20 ERROR TaskContextImpl: Error in TaskCompletionListener java.lang.IllegalStateException: state should be: open at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70) at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:70) at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86) at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:263) at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:148) at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41) at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:258) at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:256) at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:20 INFO DAGScheduler: Job 1 failed: count at NativeMethodAccessorImpl.java:-2, took 5.577351 s 16/04/29 13:14:20 WARN TaskMemoryManager: leak 4.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@4a9b97ee 16/04/29 13:14:20 ERROR Executor: Managed memory leak detected; size = 4456448 bytes, TID = 9 16/04/29 13:14:20 ERROR Executor: Exception in task 1.0 in stage 2.0 (TID 9) org.apache.spark.util.TaskCompletionListenerException: state should be: open at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:20 INFO TaskSetManager: Lost task 1.0 in stage 2.0 (TID 9) on executor localhost: org.apache.spark.util.TaskCompletionListenerException (state should be: open) [duplicate 2] Traceback (most recent call last): File "/Users/samweaver/Desktop/movies-spark.py", line 58, in <module> df.count() File "/Users/samweaver/Downloads/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 269, in count File "/Users/samweaver/Downloads/spark-1.6.1-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ File "/Users/samweaver/Downloads/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco File "/Users/samweaver/Downloads/spark-1.6.1-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError16/04/29 13:14:20 INFO connection: Closed connection [connectionId{localValue:5, serverValue:7593}] to 127.0.0.1:27017 because the pool has been closed. 16/04/29 13:14:20 ERROR TaskContextImpl: Error in TaskCompletionListener java.lang.IllegalStateException: state should be: open at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70) at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:70) at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86) at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:263) at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:148) at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41) at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:258) at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:256) at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:20 WARN TaskMemoryManager: leak 4.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@13186b4c 16/04/29 13:14:20 ERROR Executor: Managed memory leak detected; size = 4456448 bytes, TID = 11 16/04/29 13:14:20 ERROR Executor: Exception in task 3.0 in stage 2.0 (TID 11) org.apache.spark.util.TaskCompletionListenerException: state should be: open at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:20 INFO TaskSetManager: Lost task 3.0 in stage 2.0 (TID 11) on executor localhost: org.apache.spark.util.TaskCompletionListenerException (state should be: open) [duplicate 3] : An error occurred while calling o28.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 2.0 failed 1 times, most recent failure: Lost task 2.0 in stage 2.0 (TID 10, localhost): org.apache.spark.util.TaskCompletionListenerException: state should be: open at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.collect(RDD.scala:926) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505) at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1515) at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1514) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099) at org.apache.spark.sql.DataFrame.count(DataFrame.scala:1514) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.spark.util.TaskCompletionListenerException: state should be: open at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more 16/04/29 13:14:20 INFO MongoClientCache: Closing MongoClient: [127.0.0.1:27017] 16/04/29 13:14:20 INFO SparkContext: Invoking stop() from shutdown hook 16/04/29 13:14:20 INFO SparkUI: Stopped Spark web UI at http://10.4.126.113:4041 16/04/29 13:14:20 INFO connection: Closed connection [connectionId{localValue:9, serverValue:7599}] to 127.0.0.1:27017 because the pool has been closed. 16/04/29 13:14:20 ERROR TaskContextImpl: Error in TaskCompletionListener java.lang.IllegalStateException: state should be: open at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70) at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:70) at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86) at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:263) at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:148) at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41) at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:258) at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:256) at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:20 WARN TaskMemoryManager: leak 4.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@7a4b56ab 16/04/29 13:14:20 ERROR Executor: Managed memory leak detected; size = 4456448 bytes, TID = 13 16/04/29 13:14:20 ERROR Executor: Exception in task 5.0 in stage 2.0 (TID 13) org.apache.spark.util.TaskCompletionListenerException: state should be: open at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:20 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/04/29 13:14:20 INFO MemoryStore: MemoryStore cleared 16/04/29 13:14:20 INFO BlockManager: BlockManager stopped 16/04/29 13:14:20 INFO BlockManagerMaster: BlockManagerMaster stopped 16/04/29 13:14:20 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/04/29 13:14:20 INFO SparkContext: Successfully stopped SparkContext 16/04/29 13:14:20 INFO ShutdownHookManager: Shutdown hook called 16/04/29 13:14:20 INFO connection: Closed connection [connectionId{localValue:8, serverValue:7598}] to 127.0.0.1:27017 because the pool has been closed. 16/04/29 13:14:20 ERROR TaskContextImpl: Error in TaskCompletionListener java.lang.IllegalStateException: state should be: open at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70) at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:70) at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86) at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:263) at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:148) at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41) at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:258) at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:256) at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:20 WARN TaskMemoryManager: leak 4.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@5768f06b 16/04/29 13:14:20 ERROR Executor: Managed memory leak detected; size = 4456448 bytes, TID = 12 16/04/29 13:14:20 ERROR Executor: Exception in task 4.0 in stage 2.0 (TID 12) org.apache.spark.util.TaskCompletionListenerException: state should be: open at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Exception in thread "Executor task launch worker-3" java.lang.IllegalStateException: RpcEnv already stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:159) at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:131) at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516) at org.apache.spark.scheduler.local.LocalBackend.statusUpdate(LocalBackend.scala:151) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:317) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/04/29 13:14:20 INFO ShutdownHookManager: Deleting directory /private/var/folders/d5/b5c42jg54t98npm2f1dndrp80000gp/T/spark-08656be4-1592-44f5-a509-c02bdef10271 16/04/29 13:14:20 INFO ShutdownHookManager: Deleting directory /private/var/folders/d5/b5c42jg54t98npm2f1dndrp80000gp/T/spark-08656be4-1592-44f5-a509-c02bdef10271/httpd-3bd90f28-2ffd-43e0-bffe-6bb2107b3bc1 16/04/29 13:14:20 INFO ShutdownHookManager: Deleting directory /private/var/folders/d5/b5c42jg54t98npm2f1dndrp80000gp/T/spark-08656be4-1592-44f5-a509-c02bdef10271/pyspark-693e6ff0-eb3c-4b64-87a3-529d2002122f 16/04/29 13:14:20 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.