[SERVER-18304] Duplicate "value" fields in the findAndModify command response Created: 03/May/15  Updated: 05/Oct/15  Resolved: 04/May/15

Status: Closed
Project: Core Server
Component/s: Concurrency, Querying
Affects Version/s: 3.0.2
Fix Version/s: 3.0.3

Type: Bug Priority: Major - P3
Reporter: Pep Martinez Assignee: Max Hirschhorn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File findAndModify_remove_queue.js    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-17902 findAndModify returns document with d... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

my code involves quite a lot of details and it's c++; I'm still to build a simple test case in, for example, python

In short:

  • one producer, running inserts like db.coll.insert ( {d: data, t: now()}

    ). Producer in a tight loop inserting 100K elements

  • 5-10 consumers, one in each thread, getting elements with
    db.coll.findAndModify({
    sort: { t: 1 }

    ,
    remove: true
    })

consumers end up returning about 140-160K elements. Some are duplicated even thrice

Sprint: Quint Iteration 3
Participants:

 Description   
Issue Status as of Sep 29, 2015

ISSUE SUMMARY
On MongoDB instances running with document-level concurrency storage engines, concurrent modifications of the same document by multiple clients may cause WriteConflictExceptions to be thrown. These write conflicts are handled internally by retrying the operation.

A bug that affects both findAndModify remove operations and findAndModify update operations with new=false may cause the "value" field to be included in the server's response multiple times. Each occurrence of the field is likely to be associated with different documents or different versions of the same document.

The server's response was mutated to include the old version of the document prior to the delete or update operation taking place. If faced with a write conflict, the operation would be retried on a potentially different document. Each attempt to perform the operation would cause that document to be appended to the server's response under a new "value" field. This means that the last "value" field present in the server's response is the actual document that was updated or removed by the client.

USER IMPACT
How a driver handles duplicate fields in the server's response isn't standardized behavior. The most common scenarios are that:

  1. The driver may choose the first "value" field in the response. This corresponds to the document that was first attempted to be updated or deleted, but wasn't actually the one modified by this client.
  2. The driver may choose the last "value" field in the response. This corresponds to the document that was updated or deleted by this client.
  3. The driver may return an error because the "value" field was present in the response multiple times.

For sufficiently large documents with many concurrent findAndModify operations, it is possible to trigger enough write conflicts such that the server's response exceeeds 16MB. This causes an error to be returned to the client saying that the BSONObj size is invalid.

WORKAROUNDS
None.

AFFECTED VERSIONS
The 3.0 release series up until (and including) 3.0.2 are affected by this issue. Only MongoDB instances running with document-level concurrency storage engines (e.g. WiredTiger and RocksDB) are affected. MMAPv1 doesn't have a notion of write conflicts and is not affected.

FIX VERSION
The fix is included in the 3.0.3 production release.

Original description

I'm using a mongo collection as a queue, on mongo v3.0.2 with wiredtiger; producers use simple inserts and consumers use findAndModify with a sort descending on a timestamp, and remove option

I've noticed that when I use one producer and multiple consumers in separated threads, in quite a many cases more than one thread return or obtain the same element out of the findAndModify. I've reread the docs on the semantics, and this seems a bug to me



 Comments   
Comment by Githook User [ 04/May/15 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-18304 Add findAndModify FSM workloads as regression tests.
Branch: master
https://github.com/mongodb/mongo/commit/50922d9f626758855326f6b5a06e940269189e11

Comment by Githook User [ 04/May/15 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-18304 Only call _appendHelper() if update/remove succeeds.

Otherwise the response will contain the "value" field multiple times
because the command result is mutated even when the operation does
not succeed (due to a WriteConflictException). Depending on the client
logic, e.g. using the first occurrence, it may end up claiming to have
deleted a document that was actually deleted by another client.
Branch: v3.0
https://github.com/mongodb/mongo/commit/feef878308e0ee990a17826073d8e4db9ec1657b

Comment by Max Hirschhorn [ 04/May/15 ]

I was able to reproduce this with an FSM workload (attached) that repeatedly removes documents from a collection using the findAndModify command. Note that a similar issue exists for updates with new=false.

While prototyping the workload, I found that the collection was still empty at the end of the workload. That is to say, despite some threads claiming to have removed the same document and running for a fixed number of iterations, all documents had actually been removed.

From reading through the CmdFindAndModify implementation in 3.0.2, I realized that we would have already filled in the values for the result object prior to calling deleteObjects(), which may trigger a WriteConflictException. By moving the _appendHelper() call to after the if (found) statement, I can no longer reproduce the issue. (Note that the actual fix will be slightly more involved because we'll need to store a copy of the removed document for mmapv1.)

To summarize, each thread is still removing a distinct document, but the response that comes back from the server has the wrong value for what document was removed.

Comment by Pep Martinez [ 04/May/15 ]

I've also noticed that v3-mmapv1 and v2.6 are not affected by this issue

Comment by Ramon Fernandez Marina [ 04/May/15 ]

Thanks for the report pep.martinez, we're able to observe the behavior you describe and we're investigating.

Generated at Thu Feb 08 03:47:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.