• Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 0.2
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      Working in Python, here is the code that is being called:

      Unable to find source-code formatter for language: python. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
        
      # set up environment
          conf = SparkConf() \
            .setAppName("MovieRatings") \
            .set("spark.executor.memory", "4g")
          sc = SparkContext(conf=conf)
      
          sqlContext = SQLContext(sc)
      
          df = sqlContext.read.format("com.mongodb.spark.sql").load()
          print("Schema:")
          df.printSchema()
      
          print("Count:")
          df.count()
      

      count never returns, instead a stack trace is generated:

      16/04/29 13:14:12 INFO SparkContext: Created broadcast 3 from broadcast at MongoRDD.scala:145
      Schema:
      root
       |-- _id: string (nullable = true)
       |-- genre: string (nullable = true)
       |-- rating: string (nullable = true)
       |-- title: string (nullable = true)
       |-- user_id: string (nullable = true)
      
      Count:
      16/04/29 13:14:14 INFO SparkContext: Starting job: count at NativeMethodAccessorImpl.java:-2
      16/04/29 13:14:14 INFO DAGScheduler: Registering RDD 12 (count at NativeMethodAccessorImpl.java:-2)
      16/04/29 13:14:14 INFO DAGScheduler: Got job 1 (count at NativeMethodAccessorImpl.java:-2) with 1 output partitions
      16/04/29 13:14:14 INFO DAGScheduler: Final stage: ResultStage 3 (count at NativeMethodAccessorImpl.java:-2)
      16/04/29 13:14:14 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 2)
      16/04/29 13:14:14 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 2)
      16/04/29 13:14:14 INFO DAGScheduler: Submitting ShuffleMapStage 2 (MapPartitionsRDD[12] at count at NativeMethodAccessorImpl.java:-2), which has no missing parents
      16/04/29 13:14:14 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 11.4 KB, free 26.0 KB)
      16/04/29 13:14:14 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 5.8 KB, free 31.7 KB)
      16/04/29 13:14:14 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:52084 (size: 5.8 KB, free: 511.1 MB)
      16/04/29 13:14:14 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1006
      16/04/29 13:14:14 INFO DAGScheduler: Submitting 6 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[12] at count at NativeMethodAccessorImpl.java:-2)
      16/04/29 13:14:14 INFO TaskSchedulerImpl: Adding task set 2.0 with 6 tasks
      16/04/29 13:14:14 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 8, localhost, partition 0,ANY, 2640 bytes)
      16/04/29 13:14:14 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 9, localhost, partition 1,ANY, 2652 bytes)
      16/04/29 13:14:14 INFO TaskSetManager: Starting task 2.0 in stage 2.0 (TID 10, localhost, partition 2,ANY, 2652 bytes)
      16/04/29 13:14:14 INFO TaskSetManager: Starting task 3.0 in stage 2.0 (TID 11, localhost, partition 3,ANY, 2652 bytes)
      16/04/29 13:14:14 INFO Executor: Running task 0.0 in stage 2.0 (TID 8)
      16/04/29 13:14:14 INFO Executor: Running task 1.0 in stage 2.0 (TID 9)
      16/04/29 13:14:14 INFO Executor: Running task 2.0 in stage 2.0 (TID 10)
      16/04/29 13:14:14 INFO Executor: Running task 3.0 in stage 2.0 (TID 11)
      16/04/29 13:14:15 INFO GenerateMutableProjection: Code generated in 403.549317 ms
      16/04/29 13:14:15 INFO GenerateUnsafeProjection: Code generated in 39.364259 ms
      16/04/29 13:14:15 INFO GenerateMutableProjection: Code generated in 17.201067 ms
      16/04/29 13:14:15 INFO GenerateUnsafeRowJoiner: Code generated in 10.042296 ms
      16/04/29 13:14:15 INFO GenerateUnsafeProjection: Code generated in 14.153414 ms
      16/04/29 13:14:16 INFO BlockManagerInfo: Removed broadcast_2_piece0 on localhost:52084 in memory (size: 1788.0 B, free: 511.1 MB)
      16/04/29 13:14:19 INFO MongoClientCache: Closing MongoClient: [127.0.0.1:27017]
      16/04/29 13:14:19 INFO connection: Closed connection [connectionId{localValue:3, serverValue:7591}] to 127.0.0.1:27017 because the pool has been closed.
      16/04/29 13:14:19 ERROR TaskContextImpl: Error in TaskCompletionListener
      java.lang.IllegalStateException: state should be: open
      	at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
      	at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:70)
      	at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86)
      	at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:263)
      	at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:148)
      	at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41)
      	at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:258)
      	at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:256)
      	at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)
      	at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
      	at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      16/04/29 13:14:19 WARN TaskMemoryManager: leak 4.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@589f3649
      16/04/29 13:14:19 ERROR Executor: Managed memory leak detected; size = 4456448 bytes, TID = 10
      16/04/29 13:14:19 INFO connection: Closed connection [connectionId{localValue:4, serverValue:7592}] to 127.0.0.1:27017 because the pool has been closed.
      16/04/29 13:14:19 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 10)
      org.apache.spark.util.TaskCompletionListenerException: state should be: open
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      16/04/29 13:14:19 ERROR TaskContextImpl: Error in TaskCompletionListener
      java.lang.IllegalStateException: state should be: open
      	at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
      	at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:70)
      	at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86)
      	at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:263)
      	at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:148)
      	at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41)
      	at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:258)
      	at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:256)
      	at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)
      	at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
      	at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      16/04/29 13:14:19 WARN TaskMemoryManager: leak 4.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@da9b296
      16/04/29 13:14:19 ERROR Executor: Managed memory leak detected; size = 4456448 bytes, TID = 8
      16/04/29 13:14:19 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 8)
      org.apache.spark.util.TaskCompletionListenerException: state should be: open
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      16/04/29 13:14:19 INFO TaskSetManager: Starting task 4.0 in stage 2.0 (TID 12, localhost, partition 4,ANY, 2652 bytes)
      16/04/29 13:14:19 INFO TaskSetManager: Starting task 5.0 in stage 2.0 (TID 13, localhost, partition 5,ANY, 2640 bytes)
      16/04/29 13:14:19 INFO Executor: Running task 4.0 in stage 2.0 (TID 12)
      16/04/29 13:14:19 INFO Executor: Running task 5.0 in stage 2.0 (TID 13)
      16/04/29 13:14:19 WARN TaskSetManager: Lost task 2.0 in stage 2.0 (TID 10, localhost): org.apache.spark.util.TaskCompletionListenerException: state should be: open
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      
      16/04/29 13:14:20 ERROR TaskSetManager: Task 2 in stage 2.0 failed 1 times; aborting job
      16/04/29 13:14:20 INFO TaskSetManager: Lost task 0.0 in stage 2.0 (TID 8) on executor localhost: org.apache.spark.util.TaskCompletionListenerException (state should be: open) [duplicate 1]
      16/04/29 13:14:20 INFO cluster: Cluster created with settings {hosts=[127.0.0.1:27017], mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
      16/04/29 13:14:20 INFO cluster: Adding discovered server 127.0.0.1:27017 to client view of cluster
      16/04/29 13:14:20 INFO cluster: Cluster created with settings {hosts=[127.0.0.1:27017], mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
      16/04/29 13:14:20 INFO cluster: Cluster description not yet available. Waiting for 30000 ms before timing out
      16/04/29 13:14:20 INFO cluster: Adding discovered server 127.0.0.1:27017 to client view of cluster
      16/04/29 13:14:20 INFO connection: Opened connection [connectionId{localValue:6, serverValue:7596}] to 127.0.0.1:27017
      16/04/29 13:14:20 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=127.0.0.1:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 1]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=345918}
      16/04/29 13:14:20 INFO cluster: Discovered cluster type of STANDALONE
      16/04/29 13:14:20 INFO connection: Opened connection [connectionId{localValue:7, serverValue:7597}] to 127.0.0.1:27017
      16/04/29 13:14:20 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=127.0.0.1:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 1]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=348599}
      16/04/29 13:14:20 INFO cluster: Discovered cluster type of STANDALONE
      16/04/29 13:14:20 INFO MongoClientCache: Creating MongoClient: [127.0.0.1:27017]
      16/04/29 13:14:20 INFO MongoClientCache: Creating MongoClient: [127.0.0.1:27017]
      16/04/29 13:14:20 INFO TaskSchedulerImpl: Cancelling stage 2
      16/04/29 13:14:20 INFO MongoClientCache: Closing MongoClient: [127.0.0.1:27017]
      16/04/29 13:14:20 INFO connection: Opened connection [connectionId{localValue:8, serverValue:7598}] to 127.0.0.1:27017
      16/04/29 13:14:20 INFO connection: Opened connection [connectionId{localValue:9, serverValue:7599}] to 127.0.0.1:27017
      16/04/29 13:14:20 INFO Executor: Executor is trying to kill task 4.0 in stage 2.0 (TID 12)
      16/04/29 13:14:20 INFO TaskSchedulerImpl: Stage 2 was cancelled
      16/04/29 13:14:20 INFO Executor: Executor is trying to kill task 1.0 in stage 2.0 (TID 9)
      16/04/29 13:14:20 INFO Executor: Executor is trying to kill task 5.0 in stage 2.0 (TID 13)
      16/04/29 13:14:20 INFO Executor: Executor is trying to kill task 3.0 in stage 2.0 (TID 11)
      16/04/29 13:14:20 INFO DAGScheduler: ShuffleMapStage 2 (count at NativeMethodAccessorImpl.java:-2) failed in 5.548 s
      16/04/29 13:14:20 INFO connection: Closed connection [connectionId{localValue:2, serverValue:7590}] to 127.0.0.1:27017 because the pool has been closed.
      16/04/29 13:14:20 ERROR TaskContextImpl: Error in TaskCompletionListener
      java.lang.IllegalStateException: state should be: open
      	at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
      	at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:70)
      	at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86)
      	at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:263)
      	at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:148)
      	at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41)
      	at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:258)
      	at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:256)
      	at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)
      	at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
      	at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      16/04/29 13:14:20 INFO DAGScheduler: Job 1 failed: count at NativeMethodAccessorImpl.java:-2, took 5.577351 s
      16/04/29 13:14:20 WARN TaskMemoryManager: leak 4.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@4a9b97ee
      16/04/29 13:14:20 ERROR Executor: Managed memory leak detected; size = 4456448 bytes, TID = 9
      16/04/29 13:14:20 ERROR Executor: Exception in task 1.0 in stage 2.0 (TID 9)
      org.apache.spark.util.TaskCompletionListenerException: state should be: open
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      16/04/29 13:14:20 INFO TaskSetManager: Lost task 1.0 in stage 2.0 (TID 9) on executor localhost: org.apache.spark.util.TaskCompletionListenerException (state should be: open) [duplicate 2]
      Traceback (most recent call last):
        File "/Users/samweaver/Desktop/movies-spark.py", line 58, in <module>
          df.count()
        File "/Users/samweaver/Downloads/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 269, in count
        File "/Users/samweaver/Downloads/spark-1.6.1-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
        File "/Users/samweaver/Downloads/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
        File "/Users/samweaver/Downloads/spark-1.6.1-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
      py4j.protocol.Py4JJavaError16/04/29 13:14:20 INFO connection: Closed connection [connectionId{localValue:5, serverValue:7593}] to 127.0.0.1:27017 because the pool has been closed.
      16/04/29 13:14:20 ERROR TaskContextImpl: Error in TaskCompletionListener
      java.lang.IllegalStateException: state should be: open
      	at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
      	at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:70)
      	at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86)
      	at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:263)
      	at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:148)
      	at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41)
      	at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:258)
      	at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:256)
      	at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)
      	at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
      	at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      16/04/29 13:14:20 WARN TaskMemoryManager: leak 4.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@13186b4c
      16/04/29 13:14:20 ERROR Executor: Managed memory leak detected; size = 4456448 bytes, TID = 11
      16/04/29 13:14:20 ERROR Executor: Exception in task 3.0 in stage 2.0 (TID 11)
      org.apache.spark.util.TaskCompletionListenerException: state should be: open
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      16/04/29 13:14:20 INFO TaskSetManager: Lost task 3.0 in stage 2.0 (TID 11) on executor localhost: org.apache.spark.util.TaskCompletionListenerException (state should be: open) [duplicate 3]
      : An error occurred while calling o28.count.
      : org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 2.0 failed 1 times, most recent failure: Lost task 2.0 in stage 2.0 (TID 10, localhost): org.apache.spark.util.TaskCompletionListenerException: state should be: open
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      
      Driver stacktrace:
      	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
      	at scala.Option.foreach(Option.scala:236)
      	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
      	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
      	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
      	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
      	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
      	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
      	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
      	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
      	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
      	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
      	at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
      	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166)
      	at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
      	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
      	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
      	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
      	at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
      	at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498)
      	at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505)
      	at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1515)
      	at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1514)
      	at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
      	at org.apache.spark.sql.DataFrame.count(DataFrame.scala:1514)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:497)
      	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
      	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
      	at py4j.Gateway.invoke(Gateway.java:259)
      	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
      	at py4j.commands.CallCommand.execute(CallCommand.java:79)
      	at py4j.GatewayConnection.run(GatewayConnection.java:209)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.spark.util.TaskCompletionListenerException: state should be: open
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	... 1 more
      
      16/04/29 13:14:20 INFO MongoClientCache: Closing MongoClient: [127.0.0.1:27017]
      16/04/29 13:14:20 INFO SparkContext: Invoking stop() from shutdown hook
      16/04/29 13:14:20 INFO SparkUI: Stopped Spark web UI at http://10.4.126.113:4041
      16/04/29 13:14:20 INFO connection: Closed connection [connectionId{localValue:9, serverValue:7599}] to 127.0.0.1:27017 because the pool has been closed.
      16/04/29 13:14:20 ERROR TaskContextImpl: Error in TaskCompletionListener
      java.lang.IllegalStateException: state should be: open
      	at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
      	at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:70)
      	at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86)
      	at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:263)
      	at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:148)
      	at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41)
      	at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:258)
      	at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:256)
      	at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)
      	at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
      	at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      16/04/29 13:14:20 WARN TaskMemoryManager: leak 4.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@7a4b56ab
      16/04/29 13:14:20 ERROR Executor: Managed memory leak detected; size = 4456448 bytes, TID = 13
      16/04/29 13:14:20 ERROR Executor: Exception in task 5.0 in stage 2.0 (TID 13)
      org.apache.spark.util.TaskCompletionListenerException: state should be: open
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      16/04/29 13:14:20 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
      16/04/29 13:14:20 INFO MemoryStore: MemoryStore cleared
      16/04/29 13:14:20 INFO BlockManager: BlockManager stopped
      16/04/29 13:14:20 INFO BlockManagerMaster: BlockManagerMaster stopped
      16/04/29 13:14:20 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
      16/04/29 13:14:20 INFO SparkContext: Successfully stopped SparkContext
      16/04/29 13:14:20 INFO ShutdownHookManager: Shutdown hook called
      16/04/29 13:14:20 INFO connection: Closed connection [connectionId{localValue:8, serverValue:7598}] to 127.0.0.1:27017 because the pool has been closed.
      16/04/29 13:14:20 ERROR TaskContextImpl: Error in TaskCompletionListener
      java.lang.IllegalStateException: state should be: open
      	at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
      	at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:70)
      	at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86)
      	at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:263)
      	at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:148)
      	at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41)
      	at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:258)
      	at com.mongodb.spark.rdd.MongoRDD$$anonfun$compute$1.apply(MongoRDD.scala:256)
      	at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)
      	at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
      	at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      16/04/29 13:14:20 WARN TaskMemoryManager: leak 4.3 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@5768f06b
      16/04/29 13:14:20 ERROR Executor: Managed memory leak detected; size = 4456448 bytes, TID = 12
      16/04/29 13:14:20 ERROR Executor: Exception in task 4.0 in stage 2.0 (TID 12)
      org.apache.spark.util.TaskCompletionListenerException: state should be: open
      	at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
      	at org.apache.spark.scheduler.Task.run(Task.scala:91)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Exception in thread "Executor task launch worker-3" java.lang.IllegalStateException: RpcEnv already stopped.
      	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:159)
      	at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:131)
      	at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
      	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516)
      	at org.apache.spark.scheduler.local.LocalBackend.statusUpdate(LocalBackend.scala:151)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:317)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      16/04/29 13:14:20 INFO ShutdownHookManager: Deleting directory /private/var/folders/d5/b5c42jg54t98npm2f1dndrp80000gp/T/spark-08656be4-1592-44f5-a509-c02bdef10271
      16/04/29 13:14:20 INFO ShutdownHookManager: Deleting directory /private/var/folders/d5/b5c42jg54t98npm2f1dndrp80000gp/T/spark-08656be4-1592-44f5-a509-c02bdef10271/httpd-3bd90f28-2ffd-43e0-bffe-6bb2107b3bc1
      16/04/29 13:14:20 INFO ShutdownHookManager: Deleting directory /private/var/folders/d5/b5c42jg54t98npm2f1dndrp80000gp/T/spark-08656be4-1592-44f5-a509-c02bdef10271/pyspark-693e6ff0-eb3c-4b64-87a3-529d2002122f
      16/04/29 13:14:20 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
      

            Assignee:
            Unassigned Unassigned
            Reporter:
            sam.weaver Sam Weaver
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: