[DOCS-9120] Explain that using the aggregation pipeline instead of collection.distinct() is a better equivalent to SQL "SELECT DISTINCT x FROM .." Created: 09/Oct/16 Updated: 30/Oct/23 Resolved: 31/Jan/18 |
|
| Status: | Closed |
| Project: | Documentation |
| Component/s: | manual |
| Affects Version/s: | None |
| Fix Version/s: | Server_Docs_20231030 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Akira Kurogane | Assignee: | Allison Reinheimer Moore |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: | |
| Days since reply: | 6 years, 2 weeks ago |
| Story Points: | 0.25 |
| Description |
|
A customer who didn't know about or aggregation pipelines, or knew about it but maybe saw it as being too powerful, was trying to do the equivalent of "SELECT DISTINCT field1 FROM ...". The documentation he used was https://docs.mongodb.com/manual/reference/method/db.collection.distinct/. I suspect he may have also read https://docs.mongodb.com/manual/reference/sql-comparison/, which gives the example of "SELECT DISTINCT(status) FROM users" == "db.users.distinct( "status" )" He ran into the problem that he had hundreds of thousands of distinct values in the collection he was examining and this lead to "BSONObj size: 25448908 (0x18451CC) is invalid. Size must be between 0 and 16793600(16MB)" when he tried to use the distinct command. That is the distinct values weren't a small set, so the result size exceeded 16MB, so the packing of array result value threw an exception when it reached max BSON object size. Neither the db.collection.distinct page or https://docs.mongodb.com/manual/reference/command/distinct/#dbcmd.distinct for the underlying db command highlight that the single array result means a 16mb result size limit. Nor do they explain to avoid that and get a cursor instead (like a find command, which is the equivalent of the SQL SELECT command) you should use an aggregation pipeline per the example at https://docs.mongodb.com/manual/reference/operator/aggregation/group/#retrieve-distinct-values. I think it's time we start describing the distinct command / wrapper method as a convenience method with limited result size and point to the aggregation pipeline method ( "db.myCollection.aggregate( [ { $group : { _id : "$myField" } } ] )" ) as the fundamental way of fetching distinct values. |
| Comments |
| Comment by Githook User [ 01/Feb/18 ] |
|
Author: {'email': 'allison.moore@10gen.com', 'name': 'Allison Moore', 'username': 'schmalliso'}Message: |
| Comment by Akira Kurogane [ 30/Jan/18 ] |
|
Thanks Allison. Please excuse the nitpicking. Code review LGTM'ed just now. |
| Comment by Akira Kurogane [ 26/Jan/18 ] |
|
Hi allison.moore. Code review update for https://mongodbcr.appspot.com/177940001/ made. |