[GODRIVER-2459] Proposal: Improve a way of making filters and aggregation pipilines Created: 15/Jun/22 Updated: 11/Oct/23 |
|
| Status: | Backlog |
| Project: | Go Driver |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | hummerd N/A | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
For now to write filter (or aggregation) in golang we must construct bson.D object, and it is not handy for 2 reasons: Example:
But what if we will use some helpers to work with text filters instead:
Now filter looks more developer-friendly and you can copy and paste it between tools. We can cache compiled queries on client side so there will be almost no overhead. Example of implementation: |
| Comments |
| Comment by hummerd N/A [ 28/Jun/22 ] | |||||||||
|
Hi Benji! | |||||||||
| Comment by Benji Rewis (Inactive) [ 28/Jun/22 ] | |||||||||
|
The caching solution certainly has some benefits; I'm still of the opinion that we just want a better way to declare and build bson.D documents.
Ah interesting; I had not expected the M type to differ significantly in number of allocations as compared to the D type. Can I ask what memory usage you're seeing with D?
Fair point; sounds like we'll want to consider a JSON-like declaration syntax that can be either directly used or minimally transformed in these other applications. I'm opening this ticket for now and we've marked it as related to GODRIVER-2271. | |||||||||
| Comment by hummerd N/A [ 26/Jun/22 ] | |||||||||
|
You are right that there are another use cases where we need to build dynamic queries, and in this case we will have to recompile the query. I can not say how many of them are really have high cardinality. For low cardinality dynamic queries application sooner or later will end up using a compiled query. I also introduced parameters for queries to reduce amount of unique queries. And we can make MRU cache that will cover most frequent cases. Using bson.M is also not good for performance - because bson.M{"some": value} makes 2 allocations and consumes 336 bytes (oh that's a lot And still there is a problem with bson.M that you can not copy paste it to Atlas/Compass/Mongosh (for testing purpose) or to another language driver (for example C#). And just wanted to mention that for me as for engineer that works with mongo drivers every day this is the most wanted feature. | |||||||||
| Comment by Benji Rewis (Inactive) [ 22/Jun/22 ] | |||||||||
Makes sense to me haha. I understand the desire not to suppress lint warnings.
That certainly makes sense if you're building the same query over and over again. I am not convinced, however, that all users will have a use-case like that and may in fact be building highly variable queries throughout the lifetime of the application. I think this is certainly an orthogonal improvement that we'll want to make, so I'll backlog the ticket for now as it will be non-trivial to design the new API and the team does not have a lot of availability at the moment. I'll also say that using bson.M where you can (it does not preserve order of keys in the map) could make long queries and pipelines a little cleaner. As in:
| |||||||||
| Comment by hummerd N/A [ 22/Jun/22 ] | |||||||||
|
Hi @Benji Rewis! Yep I can disable "field name" linters (but I think it is not cool As for additional marshaling it only happens once (because of caching) during the application life time. Not even once per query, I mean once until restart. So for real life applications you will not feel the difference - we marshal and unmarshall a loooooot of documents per second. +1 document wont make the difference =). | |||||||||
| Comment by Benji Rewis (Inactive) [ 22/Jun/22 ] | |||||||||
|
Hello hummer.dk@gmail.com, and thanks for your improvement suggestion! Firstly, this is definitely an issue we’re aware of and one that’s been brought up before. Long documents and pipelines are hard to write and perhaps even harder to read in the Go driver. My only suggestions currently are to maybe use the bsoncore document builder syntax if that proves more readable (it’s also more performant, as marshaling is not required). Or, to stop keying the fields of your primitive.E values. As in:
We don’t key the fields internally when we create bson.D objects. You may have to disable the “unkeyed fields” linter message in your IDEs, though. A cleaner document builder would be an excellent improvement to the driver. I gather the main pain points now are the extra opening and closing brackets and the need to write bson.D or bson.A at the start of every object. Your implementation certainly looks more friendly, and the addition of caching is great. But to clarify, are you accepting extended JSON documents, unmarshaling them to bson.D instances, and then using the driver to marshal the bson.D instances to bytes that are passed over the wire? I would love a solution where we can avoid the extra unmarshaling from extended JSON to bson.D instances and still provide a cleaner API. | |||||||||
| Comment by Esha Bhargava [ 17/Jun/22 ] | |||||||||
|
hummer.dk@gmail.com Thank you for reporting this issue. We'll look into it and get back to you soon. |