[SERVER-41714] $facet operator duplicates documents in the pipeline when preceded by $addFields and $match operators (in this exact order) Created: 13/Jun/19 Updated: 29/Oct/23 Resolved: 28/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 4.0.10 |
| Fix Version/s: | 4.3.1, 4.2.20 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Łukasz Karczewski | Assignee: | Xin Hao Zhang (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | Bug, query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | Linux | ||||
| Backport Requested: |
v4.2
|
||||
| Sprint: | Query 2019-07-01 | ||||
| Participants: | |||||
| Description |
|
db.isMaster() result:
Such an operation:
On such collections:
returns such a result:
As we can see, the first document in the pipeline gets duplicated items. However, if we swap the $addFields and $match operators in the $lookup pipeline, everything works fine. The same goes for a situation in which the $addFields operator is removed. If I exclude the document with _id equal to u1 then the document with _id equal to u2 will get duplicated items in the items_check field. If I remove the $facet operator from the $lookup pipeline then no documents are duplicated. |
| Comments |
| Comment by Githook User [ 27/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Xin Hao Zhang', 'username': 'xinhaoz', 'email': 'xinhao.zhang@mongodb.com'}Message: | ||||||||||||||||||||||||||||||||||||||
| Comment by James Wahlin [ 20/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||
|
This behavior is caused by a difference in assumed aggregation getNext() behavior between the TeeBuffer class (used by DocumentSourceFacet via DocumentSourceTeeConsumer) and DocumentSourceSequentialDocumentCache. When DocumentSourceSequentialDocumentCache is building its cache, it will iterate over its source until it hits EOF. At that point it will switch its SequentialDocumentCache mode from "building" to "serving" via call to freeze() and will return EOF to the TeeBuffer. The TeeBuffer loads documents in batches. When it retrieves a batch (via call to TeeBuffer::loadNextBatch()) it will continue to pull documents from its source until it hits an EOF or it reaches its maximum batch size. In the case an EOF is encountered, this return status is swallowed. TeeBuffer::getNext() then relies on checking for an empty buffer after calling loadNextBatch() in order to return an EOF to its consumer. This second, post-EOF call to loadNextBatch() results in a call to DocumentSourceSequentialDocumentCache::getNext() which instead of returning EOF (having switched from "building " to "serving") will instead return its cached document, resulting in the same document being returned twice. In order to fix this issue we can do one of two things: | ||||||||||||||||||||||||||||||||||||||
| Comment by James Wahlin [ 19/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||
|
Here is a stripped down version of the above reproduction, which fails the assertion on line 37:
| ||||||||||||||||||||||||||||||||||||||
| Comment by Danny Hatcher (Inactive) [ 14/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||
|
Hello, Thank you for your report. Please note that unless a given issue refers only to a specific driver, it is best to open problems like these in our general SERVER project. I've forwarded this onto the Query team to take a look. |