[SERVER-22002] Do not retry findAndModify operations on MMAPv1 Created: 23/Dec/15  Updated: 18/Nov/16  Resolved: 21/Jan/16

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.2.0-rc3
Fix Version/s: 3.2.3, 3.3.1

Type: Improvement Priority: Major - P3
Reporter: Charlie Swanson Assignee: Charlie Swanson
Resolution: Done Votes: 0
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-8935 Do not retry findAndModify operations... Closed
Related
related to SERVER-22324 Update findAndModify FSM workloads to... Closed
is related to SERVER-21434 FindAndModify query unexpectedly retu... Closed
Backwards Compatibility: Fully Compatible
Backport Completed:
Sprint: QuInt E (01/11/16), Query F (02/01/16)
Participants:

 Description   

If a findAndModify command requests a sort, we pass a limit of 1 to that sort so that it will only keep track of the top document, reducing it's memory footprint.

This can lead to lots of retrying when there are many findAndModify operations which all specify a sort and are trying to update the same document, which is a common pattern for some sort of work queue.

What happens is that each findAndModify operation will chug along doing it's sort (I think this is only a problem if it's an in-memory sort, not an index-provided sort), each one periodically yielding, and one will eventually finish first and update the document. Each other operation will eventually find that the matching document has already been modified, so cannot update it themselves. Since there was a limit of 1, the sort stage does not have the next matching document, and the operations have to restart, and re-sort everything.

This is particularly bad on MMAPv1, since the updates have to take an exclusive lock on the collection. The logic to retry operations on MMAPv1 was only introduced in SERVER-21434. We should revert this change, instead allowing clients to continue to retry (as they likely are already).

As part of this work, it was discovered that there are cases where we do not retry the operation, even on WiredTiger. We should fix those places as well.



 Comments   
Comment by Githook User [ 05/Aug/16 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: Revert "SERVER-22002 Regression tests for concurrent findAndModify operations"

This reverts commit 6f3142b97b00464efebe26ef35cba73d47cf98a4.
Branch: develop
https://github.com/mongodb/mongo-perf/commit/9a01dbc34b8dab12ff14028090610df30187ef65

Comment by Githook User [ 05/Aug/16 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-22002 Regression tests for concurrent findAndModify operations
Branch: develop
https://github.com/mongodb/mongo-perf/commit/6f3142b97b00464efebe26ef35cba73d47cf98a4

Comment by Nick Judson [ 26/Mar/16 ]

...and just because no one has mentioned it before, doesn't mean it isn't a bug. Just means it's an edge case.

Comment by Nick Judson [ 26/Mar/16 ]

I don't personally have any problem with this as I don't use MMAP, and I can understand the reasons you've given. My point was simply that I would not expect functional changes when switching between storage engines. Keep up the good work!

Comment by Charlie Swanson [ 25/Mar/16 ]

nick@innsenroute.com, yes. The retrying behavior differs based on the storage engine. We made this decision for a number of reasons:

  1. Retrying on MMAPv1 is more expensive because it does not support document granularity concurrency, so each retry will take an exclusive lock on the collection.
  2. Conversely, retrying on WiredTiger (or other modern storage engines) is not as expensive, for the same reason. WiredTiger supports document granularity concurrency, so we do not take as many resources for retrying.
  3. Adding the retry logic on MMAPv1 adds additional complexity to the codebase, and isn't very straightforward to add or to test. Further, for much if not all of findAndModify's history on MMAPv1, operations were not retried in this case, and no one has noticed or complained.
  4. The decision to support retrying on WiredTiger was made for version 3.0, so we are treating the WiredTiger behavior change in version 3.2 as a regression.
Comment by Nick Judson [ 08/Feb/16 ]

So does this mean different behavior for different storage engines? I would be extremely careful making an assumption on "letting a client retry". This would break any typical queue implementation.

Comment by Githook User [ 21/Jan/16 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-22002 Do not retry findAndModify on MMAPv1
Branch: v3.2
https://github.com/mongodb/mongo/commit/c7c5010151e6e0d29a16e14429c864b0c7dd8386

Comment by Githook User [ 21/Jan/16 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-22002 Do not retry findAndModify on MMAPv1
Branch: master
https://github.com/mongodb/mongo/commit/f1ba338629578aaa6bff4ccadd4efbb8f862b37d

Comment by Githook User [ 13/Jan/16 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: Revert "SERVER-22002 Regression tests for concurrent findAndModify operations"

This reverts commit 6f3142b97b00464efebe26ef35cba73d47cf98a4.
Branch: master
https://github.com/mongodb/mongo-perf/commit/9a01dbc34b8dab12ff14028090610df30187ef65

Comment by Githook User [ 13/Jan/16 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-22002 Regression tests for concurrent findAndModify operations
Branch: master
https://github.com/mongodb/mongo-perf/commit/6f3142b97b00464efebe26ef35cba73d47cf98a4

Comment by Charlie Swanson [ 11/Jan/16 ]

Updated the description once again to reflect our new solution.

Comment by Charlie Swanson [ 05/Jan/16 ]

Updated description to reflect the fix we went with. We decided not to make the size of the limit configurable, since the 32MB sort limit already is configurable.

Generated at Thu Feb 08 03:59:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.