Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-112

MongoSamplePartitioner partitionKey boundaries problem

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.1.0
    • Affects Version/s: 1.1.0, 2.0.0
    • Component/s: API
    • None
    • Environment:
      spark-core v1.6.1
      spark-sql v1.6.1
      mongo-spark-connector v1.1 for scala 2.10
      mongo-java-driver v3.4.0

      When using MongoSamplePartitioner with a custom partitionKey , partition boundaries are matching partitionKey but values are set to ObjectId values:

      {
              "$match" : {
                  "identifier" : {
                      "$gte" : ObjectId("584cc388809ba8325a6df908"), 
                      "$lt" : ObjectId("586f76d3809ba8325a82186f")
                  }
              }
          }
      

      The problem looks to be in MongoSamplePartitioner.scala:99, value field key is set to "_id" , but should be partitionKey:

      samples.zipWithIndex.collect { case (field, i) if i % samplesPerPartition == 0 => field.get("_id") }
      

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            anas.dhouib@gmail.com Dhouib Anas
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: