[SERVER-4608] aggregation: allow binary data to pass through pipelines Created: 03/Jan/12 Updated: 24/Mar/17 Resolved: 11/Dec/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 2.3.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Daniel Pasette (Inactive) | Assignee: | Mathias Stearn |
| Resolution: | Done | Votes: | 19 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
Support pass-through, $sort, and $group on Binary fields |
| Comments |
| Comment by auto [ 28/May/13 ] |
|
Author: {u'username': u'asya999', u'name': u'Asya Kamsky', u'email': u'asya999@gmail.com'}Message: 2.4 removed restriction on BINARY See https://jira.mongodb.org/browse/SERVER-4608 |
| Comment by auto [ 11/Dec/12 ] |
|
Author: {u'date': u'2012-11-29T19:54:48Z', u'email': u'mathias@10gen.com', u'name': u'Mathias Stearn'}Message: Add at least minimal support for all types to agg Minimal support means conversion to/from BSON, comparison and hashing.
|
| Comment by Mathias Stearn [ 03/Dec/12 ] |
|
Updating ticket to reflect fix. All BSON types will be supported regardless of type or size. |
| Comment by auto [ 12/Jul/12 ] |
|
Author: {u'date': u'2012-06-29T16:49:56-07:00', u'email': u'mathias@10gen.com', u'name': u'Mathias Stearn'}Message: If there is an early simple $project, apply it before converting to Documents This is a partial fix for This also provides a workaround for objects with types that aren't This commit will need some doc updates, in particular in the "Optimizing |
| Comment by Chris Westin [ 01/Jun/12 ] |
|
@Paul van Brouwershaven: Yes, and I'm currently working on |
| Comment by Paul van Brouwershaven [ 01/Jun/12 ] |
|
The problem is that you can't use the aggregation for a collection that contains only a few binary objects. I'm not interested in the binary object for the aggregation, I just want to use the $group functionality and can't simply delete these binary objects from the document collection. In a simple group aggregation you will only use a count and an identifier object, for this query you will not interested in any other objects. The binary object should only be a problem if it would be your identifier (group by) or if you want to do something else with it. Probably I'm thinking to simple but should fields that are not used in a query not be ignored? |
| Comment by Chris Westin [ 27/Apr/12 ] |
|
@Victor Kabdebon: Thanks, I'll take a look at what you've got. Right now we're trying to lock down 2.2, so I'm not sure if this will make it in or not, but we should have something soon. We may also rely on a combination of features such as those discussed above. |
| Comment by Chris Westin [ 27/Apr/12 ] |
|
@Mathias: separate ticket please, marked related. I suspect I'm more likely to rely on |
| Comment by Mathias Stearn [ 25/Apr/12 ] |
|
There is also an issue with functions (codeWScope to be specific). I ran into it while trying to run a pivot aggregation on a sampling of db.currentOp() runs. It would be nice if you could pass small functions through, or perhaps limit it to just the function name and signature. Do you want me to make a separate ticket for that or would it be handled the same as this one? |
| Comment by Victor Kabdebon [ 23/Apr/12 ] |
|
@Chris: Hi Chris, playing with local information such as the Subtype I wrote a temporary fix for this problem and make it as a pull request on github (see [1]). The problem is that all the clients I am using : C# and Python convert to a binary array any UUID that is given to them. UUID is an identifier standard and is used everywhere and prevents the use of pipeline everywhere. [1]My attempt to fix this is located here: Best. |
| Comment by Chris Westin [ 22/Mar/12 ] |
|
@Mathias: in this case, subtype is serving as a rough proxy for size. We can't just pass or not pass documents because of their size, because this would give seemingly random and incorrect results. We have to have some kind of rule to either always do it, or never do it, depending on the locally available information. Given the schema-less nature of MongoDB, I suppose that the subtypes could vary from document to document anyway, and give the same (random, incorrect) result. I'm increasingly liking your other suggestion of using a dummy value that causes errors if it is referenced or makes it all the way to the end of the pipeline. That may be the best way out of this, other than |
| Comment by Paul Sanchez [ 21/Mar/12 ] |
|
I suppose either allowing Subtypes 3, 4, and 5, or anything that is either up to or exactly 16 bytes, or hell even a combination of both, would work for me. |
| Comment by Mathias Stearn [ 21/Mar/12 ] |
|
If you are going to do this, it should be based on size, not subtype. There is no guarantee that anything with a UUID subtype must be exactly 16 bytes. Equivalently there is no good reason not to pass a 4-byte binary string through. |
| Comment by Chris Westin [ 01/Mar/12 ] |
|
I've seen a few reports on GG of this being a problem for folks trying to pass UUIDs and MD5s through pipelines in order to get their primary keys out at the other end, as per FREE-5540. I disagree about grouping on any binary type, because the unbounded ones will consume a lot of memory to pass through the pipelines. However, we should at least support the smaller bounded types described above in the near-term. |
| Comment by auto [ 06/Jan/12 ] |
|
Author: {u'login': u'cwestin', u'name': u'U-tellus\\cwestin', u'email': u'cwestin@10gen.com'}Message: prep for |
| Comment by Eliot Horowitz (Inactive) [ 04/Jan/12 ] |
|
There is no reason we shouldn't support group on any bindata type. |
| Comment by Chris Westin [ 03/Jan/12 ] |
|
Suggested by Scott. |