Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-122

MongoPaginationPartitioner projection problem

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 2.1.0
    • Affects Version/s: 1.1.0, 2.0.0
    • Component/s: API
    • None
    • Environment:
      spark-core v1.6.1
      spark-sql v1.6.1
      mongo-spark-connector v1.1 for scala 2.10
      mongo-java-driver v3.4.0

      When using MongoPaginationPartitioner (by size or count), the request used to calculate partitions is using a projection on partition key

      val newHead: Option[BsonValue] = connector.withCollectionDo(readConfig, { coll: MongoCollection[BsonDocument] =>
                  Option(coll.find()
                    .filter(Filters.gte(partitionKey, preBsonValue))
                    .skip(skipValue)
                    .projection(Projections.include(partitionKey))
                    .sort(Sorts.ascending(partitionKey))
                    .first()).map(doc => doc.get(partitionKey))
                })
      

      If partitionKey is not _id, projection is not excluding explicitly _id, which make the query not covered by the partition key index and thus not optimized.

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            anas.dhouib@gmail.com Dhouib Anas
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: