Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-1243

New operator to update all matching items in an array

    Details

      Description

      Issue Status as of Aug 19, 2015

      MongoDB appreciates the challenge of maintaining applications utilizing schemas with large arrays, especially with respect to updating many or all array elements, and we very much recognize the interest from the community around this ticket.

      Unfortunately, to implement such a major feature there are serious requirements:

      • a specification for new language features (like update modifiers or expressions), since we cannot break existing uses
      • should be included with support to match all array elements, as well was those matching a query
        • requests for support to update the last element(s) could be considered.
      • must support all existing update modifiers (correctly, and in a non-backwards-breaking way)
        • $rename, $set, $unset, $pull, $push, $bit...
      • must work efficiently with all arrays, including those having thousands of elements
      • cannot change current update semantics or behaviors which existing applications and deployments depend on (= non-backwards-breaking).

      In summary, adding this as a new feature, or improvement, is not trivial. It will require resources which are currently working on other projects, and in short is a matter of prioritization over other parts of the whole server.

      Please see this comment below for additional details.

      Original description

      Given the following:

      > var obj = t.findOne()
      { "_id" : ObjectId("4b97e62bf1d8c7152c9ccb74"), "title" : "ABC",
        "comments" : [ { "by" : "joe", "votes" : 3 }, { "by" : "jane", "votes" : 7 } ] }
      

      One should be able to modify each item in the comments array by using an update command like the following:

      > t.update( obj, {$set:{'comments.$.votes':1}}, false, true )
      > t.find()
      { "_id" : ObjectId("4b97e62bf1d8c7152c9ccb74"), "title" : "ABC",
        "comments" : [ { "by" : "joe", "votes" : 1 }, { "by" : "jane", "votes" : 1 } ] }
      

        Issue Links

          Activity

          Hide
          asya Asya Kamsky added a comment - - edited

          There've been several workarounds mentioned on this thread. I want to point out that one of the desirable features of the workaround that's more important than performance must be correctness. This means that you must make sure that your query predicate is an exact match to the element you want to update ("elements" in the future). Than means ensuring that your query predicate matches only elements you want to change and does not match elements you don't want to change.

          There is one possible workaround, which I didn't see mentioned, and that would be to read the full (matched) document, create a targeted update to the array elements to change, and then updating the document only if no other changes have been made to the field (array) being updated.

          This has the advantage of only updating the array if it has not been modified since we read it, which is key for atomicity and correctness in multithreaded environments – see the example that follows.

          Assuming our collection "coll" has documents which have a field named "arr" which is an array where "a2" is an field that may have "old" value which needs to be set to "new", the following is code in the shell to do that to a single document:

           
          > docOrig=db.coll.findOne({"_id":1, "arr.a2":"old"},{"arr":1})
          {
          	"_id" : 1,
          	"arr" : [
          		{
          			"a1" : 0,
          			"a2" : "old"
          		},
          		{
          			"a1" : 1,
          			"a2" : "ok"
          		},
          		{
          			"a1" : 2,
          			"a2" : "old"
          		},
          		{
          			"a1" : 3,
          			"a2" : "new"
          		}
          	]
          }
          > updatedFields={}
          { }
          > for (i=0; i<docOrig.arr.length; i++) {
               if (docOrig.arr[i].hasOwnProperty("a2") && updatedFields.arr[i].a2=="old") {
                         updatedFields["arr."+i+".a2"]="new"; 
          } }
          new
          > db.coll.update(docOrig, {"$set":updatedFields})
          WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
          > print(tojsononeline(docOrig))
          {  "_id" : 1,  "arr" : [ { "a1" : 0, "a2" : "old" }, { "a1" : 1, "a2" : "ok" }, { "a1" : 2, "a2" : "old" }, { "a1" : 3, "a2" : "new" } ] }
          > print(tojsononeline(db.coll.findOne({"_id":1},{"arr":1})))
          {  "_id" : 1,  "arr" : [ { "a1" : 0, "a2" : "new" }, { "a1" : 1, "a2" : "ok" }, { "a1" : 2, "a2" : "new" }, { "a1" : 3, "a2" : "new" } ] }
          

          The above reads in the array of the document (that needs multiple elements updated), iterates over the array while creating a single "update" (using "$set") and then runs the update conditionally only if the document (technically, if the "arr" field of the document) has not been changed since we read it.

          There is a single atomic update which updates appropriate elements of the arr array with new values. Another approach could be constructed to create a totally new "arr" field for the document and do an update that replaces the "arr" field with a new array - depending on what percentage of array's fields are being changed, this may turn out to be more efficient in case where majority of array elements are being set to new value.

          I would like to caution folks that if you have many documents with huge arrays, it's likely not an efficient schema design if you need to be querying and updating multiple array elements (especially if the array is indexed).

          Asya

          Show
          asya Asya Kamsky added a comment - - edited There've been several workarounds mentioned on this thread. I want to point out that one of the desirable features of the workaround that's more important than performance must be correctness . This means that you must make sure that your query predicate is an exact match to the element you want to update ("elements" in the future). Than means ensuring that your query predicate matches only elements you want to change and does not match elements you don't want to change. There is one possible workaround, which I didn't see mentioned, and that would be to read the full (matched) document, create a targeted update to the array elements to change, and then updating the document only if no other changes have been made to the field (array) being updated. This has the advantage of only updating the array if it has not been modified since we read it, which is key for atomicity and correctness in multithreaded environments – see the example that follows. Assuming our collection "coll" has documents which have a field named "arr" which is an array where "a2" is an field that may have "old" value which needs to be set to "new" , the following is code in the shell to do that to a single document:   > docOrig=db.coll.findOne({"_id":1, "arr.a2":"old"},{"arr":1}) { "_id" : 1, "arr" : [ { "a1" : 0, "a2" : "old" }, { "a1" : 1, "a2" : "ok" }, { "a1" : 2, "a2" : "old" }, { "a1" : 3, "a2" : "new" } ] } > updatedFields={} { } > for (i=0; i<docOrig.arr.length; i++) { if (docOrig.arr[i].hasOwnProperty("a2") && updatedFields.arr[i].a2=="old") { updatedFields["arr."+i+".a2"]="new"; } } new > db.coll.update(docOrig, {"$set":updatedFields}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) > print(tojsononeline(docOrig)) { "_id" : 1, "arr" : [ { "a1" : 0, "a2" : "old" }, { "a1" : 1, "a2" : "ok" }, { "a1" : 2, "a2" : "old" }, { "a1" : 3, "a2" : "new" } ] } > print(tojsononeline(db.coll.findOne({"_id":1},{"arr":1}))) { "_id" : 1, "arr" : [ { "a1" : 0, "a2" : "new" }, { "a1" : 1, "a2" : "ok" }, { "a1" : 2, "a2" : "new" }, { "a1" : 3, "a2" : "new" } ] } The above reads in the array of the document (that needs multiple elements updated), iterates over the array while creating a single "update" (using "$set") and then runs the update conditionally only if the document (technically, if the "arr" field of the document) has not been changed since we read it. There is a single atomic update which updates appropriate elements of the arr array with new values. Another approach could be constructed to create a totally new "arr" field for the document and do an update that replaces the "arr" field with a new array - depending on what percentage of array's fields are being changed, this may turn out to be more efficient in case where majority of array elements are being set to new value. I would like to caution folks that if you have many documents with huge arrays, it's likely not an efficient schema design if you need to be querying and updating multiple array elements (especially if the array is indexed). Asya
          Hide
          jacaetevha Jason Rogers added a comment -

          to Jeremy Martin's response. Honestly Asya Kamsky, at this point saying...

          We do recognize the interest the community is expressing via this ticket and we do take this input into consideration during project planning.

          ... seems like a blowoff. You may not have meant it that way, but after 5 years I'm not sure what else you expect the community to get from that. Perhaps the community does not have a loud enough voice in regards to this feature request.

          Show
          jacaetevha Jason Rogers added a comment - to Jeremy Martin 's response. Honestly Asya Kamsky , at this point saying... We do recognize the interest the community is expressing via this ticket and we do take this input into consideration during project planning. ... seems like a blowoff. You may not have meant it that way, but after 5 years I'm not sure what else you expect the community to get from that. Perhaps the community does not have a loud enough voice in regards to this feature request.
          Hide
          RainyCode Noah added a comment - - edited

          Thank you Asya Kamsky for your reply and insight into the challenges of this feature.

          Show
          RainyCode Noah added a comment - - edited Thank you Asya Kamsky for your reply and insight into the challenges of this feature.
          Hide
          jaan@hebsdigital.com Jaan Paljasma added a comment -

          To be fair and square - I am disappointed. The feature has been up for several years, and I agree with Jason Rogers that the community has done all they could to raise awareness.
          Having many documents with huge arrays is the whole point of the noSQL Document schema. For key-value storage (or key-object) there are faster and more scalable systems out there.

          Show
          jaan@hebsdigital.com Jaan Paljasma added a comment - To be fair and square - I am disappointed. The feature has been up for several years, and I agree with Jason Rogers that the community has done all they could to raise awareness. Having many documents with huge arrays is the whole point of the noSQL Document schema. For key-value storage (or key-object) there are faster and more scalable systems out there.
          Hide
          jmar777 Jeremy Martin added a comment -

          There is one possible workaround, which I didn't see mentioned, and that would be to read the full (matched) document, create a targeted update to the array elements to change, and then updating the document only if no other changes have been made to the field (array) being updated.

          Unless I'm missing something, this seems tantamount to read-update-write with two-phase commit, right? Maybe it's not technically the same, but it provides equal burden on the client to ensure the correctness of the transaction. And, just like in two-phase commit, a real-world application of this technique would require retry logic if the array actually had been updated in the meantime, resulting in yet another query (on top of the initial one that already feels superfluous for an update operation). And this becomes exponentially more problematic when updating highly contentious resources.

          I would like to caution folks that if you have many documents with huge arrays, it's likely not an efficient schema design if you need to be querying and updating multiple array elements (especially if the array is indexed).

          I of course would have to agree with this, but this is ultimately a problem class that is independent of dataset size.

          Alright, obviously I'm frustrated. After 5 years and a hundred comments, I guess this just is what it is. And I honestly don't want to belabor a bunch of points about transactions and atomicity that I know you (Asya) understand at a much deeper level than me. It just seems like Mongo in general can get fuzzy on this topic. I.e., there's a lot of advice to put things in separate documents if you need certain update semantics, and to put them into the same document if you need certain transaction semantics, and then things get kind of hand wavy if you need both.

          I get that no database can be "all things to all users", but this one is perplexing. We're not talking about actually changing transactional or atomicity semantics at all. We simply need a way to utilize those existing semantics via an extension of the query language. I'm not claiming it's trivial, but it doesn't seem intractable, either.

          Show
          jmar777 Jeremy Martin added a comment - There is one possible workaround, which I didn't see mentioned, and that would be to read the full (matched) document, create a targeted update to the array elements to change, and then updating the document only if no other changes have been made to the field (array) being updated. Unless I'm missing something, this seems tantamount to read-update-write with two-phase commit, right? Maybe it's not technically the same, but it provides equal burden on the client to ensure the correctness of the transaction. And, just like in two-phase commit, a real-world application of this technique would require retry logic if the array actually had been updated in the meantime, resulting in yet another query (on top of the initial one that already feels superfluous for an update operation). And this becomes exponentially more problematic when updating highly contentious resources. I would like to caution folks that if you have many documents with huge arrays, it's likely not an efficient schema design if you need to be querying and updating multiple array elements (especially if the array is indexed). I of course would have to agree with this, but this is ultimately a problem class that is independent of dataset size. Alright, obviously I'm frustrated. After 5 years and a hundred comments, I guess this just is what it is. And I honestly don't want to belabor a bunch of points about transactions and atomicity that I know you (Asya) understand at a much deeper level than me. It just seems like Mongo in general can get fuzzy on this topic. I.e., there's a lot of advice to put things in separate documents if you need certain update semantics, and to put them into the same document if you need certain transaction semantics, and then things get kind of hand wavy if you need both. I get that no database can be "all things to all users", but this one is perplexing. We're not talking about actually changing transactional or atomicity semantics at all. We simply need a way to utilize those existing semantics via an extension of the query language. I'm not claiming it's trivial, but it doesn't seem intractable, either.

            Dates

            • Created:
              Updated:
              Days since reply:
              1 day ago
              Date of 1st Reply: