[SERVER-17426] Aggregation framework query by _id returns duplicates in sharded cluster (orphan documents) Created: 02/Mar/15 Updated: 24/Mar/15 Resolved: 11/Mar/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Querying |
| Affects Version/s: | 2.6.8 |
| Fix Version/s: | 2.6.9 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Anil Kumar | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
|||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | |||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||
| Steps To Reproduce: |
|
|||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||
| Description |
|
In sharded cluster aggregation by _id alone results in duplicate records in case of orphan records exists on the other shards. If the record with _id = 1 exists on the shard A (active) and shard B (orphan), the following aggregate will return 2 records.
It just happens that this is the suggested way of getting an accurate (along with $group stage) count in sharded cluster in the documentation. The incorrect behaviour is not noticed in 2.4.x and 3.0.0-rc11. |
| Comments |
| Comment by Githook User [ 11/Mar/15 ] |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: |
| Comment by Randolph Tan [ 09/Mar/15 ] |
|
Problem with _idHack is the decision of whether to filter documents is done inside getNext() when iterating over a cursor. This is problematic because getMores are currently unversioned. Problem doesn't exist in v3.0 because the decision of whether to filter is done in planning stage. |