Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-147

Error writing DF to sharded collection

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Critical - P2 Critical - P2
    • 2.2.2, 2.1.2
    • Affects Version/s: 2.2.0
    • Component/s: Schema
    • Labels:
      None
    • Environment:
      mongodb 3.2.5, spark 2.2.0, scala 2.11.8

      Hi!

      I have a collection with structure:

      {
      	"_id" : ObjectId("59e0700b0569ca5b257e1a02"),
      	"id1" : 1,
      	"id2" : 1,
      	"id3" : 3,
      	"id4" : 4,
      	"data1" : "foo",
      	"data2" : "bar"
      }
      

      The collection is sharded with key (id1, id2, id3).

      From spark, I load the collection as a DF, and when I try to update it with new data I get an error referring to the sharded key.

      val readConfig = ReadConfig(Map("uri" -> s"${conn}shardingTest.shardedTest))
      
      val df_update = MongoSpark.load(sc, readConfig).toDF[Schema].withColumn("newData", lit(24))
      
      MongoSpark
            .save(
              df_update,
              WriteConfig(
                databaseName = "shardingTest",
                collectionName = "shardedTest",
                connectionString = Some(conn),
                replaceDocument = false
              )
            )
      
      

      The error:

      [
      BulkWriteError {
      index=0,
      code=61,
      message='upsert {
      q:

      Unknown macro: { _id}

      ,
      u:{
      $set:

      Unknown macro: { id1}

      },
      upsert:true
      } does not contain shard key for pattern

      Unknown macro: { id1}

      ',
      details= { }
      }
      ]

      Apparently, mongo is complaining that the upsert isn't querying with the shard key, which is true. MongoSpark has the values of the sharded key, but is only querying with the _id and putting everything else in the upsert's "u" field.

      Shouldn't MongoSpark be using the right fields as query when collection is sharded, or as least allow the user to force which fields are query/update?

      Thanks.
      Alejandro.

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            Trujillo Alejandro [X]
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: