[SERVER-17174] Updates can fail to find document during high concurrency Created: 04/Feb/15 Updated: 27/Oct/16 Resolved: 03/Apr/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Concurrency, Write Ops |
| Affects Version/s: | 2.6.5, 2.6.7, 3.0.0-rc7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ronan Bohan | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Steps To Reproduce: | See the attached files:
|
||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Description |
|
With a cold start start certain 'update' commands fail when they are expected to succeed. Scenario can be triggered by cold-starting mongod and having multiple threads/processes update the same document. Have reproduced this on both OSX and Ubuntu using the attached scripts:
The run-js.sh Under certain circumstances some of these updates fail; the update however should have succeeded and can be manually applied later on. The common characteristic seems to be:
The nscanned metrics in each case show the document was found. However, the operation yields (numYields:1) after which the result is nMatched:0 and nModified:0, ie the update has not been applied. These particular operations also seem to hold the 'w' lock for a lot less time than those which succeed. This is probably to be expected given the behavior but is another key indicator of the problem. I thought this might have been due to document moves, ie this document is found but the operation yields before the update is applied; during this time another update comes in and successfully updates the document, moving it in the process, such that when the yielded operation resumes it cannot find the document. This does not appear to be borne out by the available evidence, i.e., I have seen many circumstances where there are no document moves taking place and yet the problem persists. Note too: it is not sufficient for the update operation to yield - there are occasions where the operation does yield and the update still completes successfully. I have also tried to force the document into memory before the update by issuing a 'findOne()' and then the 'update()' but I still see the problem. FYI I also have a slightly more complex repro written in Java which can switch between the standard Update command, the Bulk Update API and 'findAndModify()' - both Update methods suffer this problem but 'findAndModify' does not appear to. Interestingly the 'findAndModify()' does not yield - so perhaps that's a clue. (I can upload this Java version too if required). Log file snippet showing problem:
Counter example showing an update which yields and yet succeeds:
I have tested this on MongoDB 2.6.5, 2.6.7 and 3.0.0-rc7 - all are affected. I have tested my java version on 2.4.12 and it does not appear to be affected, indicating this is a regression. |
| Comments |
| Comment by Ronan Bohan [ 09/Feb/15 ] | |
|
Attaching mongod.snippet.log
ie nMatched:0 and nModified:0 with numYields:1 for an update that is expected to work. There do not appear to be any document moves occurring around this time. Note: this was captured with MongoDB 2.6.7 running on my Mac. | |
| Comment by Asya Kamsky [ 09/Feb/15 ] | |
|
Could you attach a log snippet showing a run with such a failure (at logLevel 1)? Preferably from latest available version, but 2.6 would be ok too. | |
| Comment by Scott Hernandez (Inactive) [ 04/Feb/15 ] | |
|
This sounds like expected behavior wrt yielding and concurrency, is there some reason you think this is a bug? There are many cases where yielding on a document, while a write is happening to that same document, will cause the cursor to move past that document, therefore "missing" it. This can happen due to a move, or index update (due to the fact that an index update is remove+insert in the btree) for example. |