Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-103

Uncaught exception loading RDD of empty mongo collection

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.1.0
    • Affects Version/s: 2.0.0
    • Component/s: API
    • Labels:
      None
    • Environment:
      spark-core v2.0.2
      spark-sql v2.0.2
      mongo-spark-connector v2.0.0 for scala 2.11
      mongo-java-driver v3.2.2

      When loading RDD of an empty mongo collection, an uncatched exception is raised.

      org.bson.BsonInvalidOperationException: Document does not contain key avgObjSize
      	at org.bson.BsonDocument.throwIfKeyAbsent(BsonDocument.java:798) ~[mongo-java-driver-3.2.2.jar:na]
      	at org.bson.BsonDocument.getNumber(BsonDocument.java:160) ~[mongo-java-driver-3.2.2.jar:na]
      	at com.mongodb.spark.rdd.partitioner.MongoSamplePartitioner.partitions(MongoSamplePartitioner.scala:84) ~[mongo-spark-connector_2.11-2.0.0.jar:2.0.0]
      	at com.mongodb.spark.rdd.partitioner.DefaultMongoPartitioner.partitions(DefaultMongoPartitioner.scala:34) ~[mongo-spark-connector_2.11-2.0.0.jar:2.0.0]
      
      

      Looking at failing code in MongoSamplePartitioner.scala:84 it shows that a partitioner is trying to determine the number of partitions based on the amount of documents in the collection and the average size of them. However when the collection is empty a collStats mongodb command does not return avgObjSize key in it's results, causing the org.bson.BsonInvalidOperationException: Document does not contain key avgObjSize exception

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            esurijon Ezequiel Surijon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: