[GODRIVER-2459] Proposal: Improve a way of making filters and aggregation pipilines Created: 15/Jun/22  Updated: 11/Oct/23

Status: Backlog
Project: Go Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: hummerd N/A Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to GODRIVER-3012 Create a CRUD query filters helper Backlog
related to GODRIVER-2889 Create aggregation pipeline helpers Backlog
is related to GODRIVER-2271 Update "bson.D" BSON document literal... Backlog

 Description   

For now to write filter (or aggregation) in golang we must construct bson.D object, and it is not handy for 2 reasons:
1. This looks terrible
2. You can not easily copy and paste query between mongo-compass/atlas, studio-3t or mongosh and your golang code.

Example:

 

 

f := bson.D{
    {Key: "id", Value: "abc"},
    {Key: "start", Value: bson.D{{Key: "$lte", Value: ts}}},
    {Key: "$or", Value: bson.A{
        bson.D{{Key: "end", Value: bson.D{{Key: "$exists", Value: false}}}},
        bson.D{{Key: "end", Value: nil}},
        bson.D{{Key: "end", Value: bson.D{{Key: "$gte", Value: ts}}}},
    }},
}

 

But what if we will use some helpers to work with text filters instead:

f := mongo.MustCompileQuery(`{
        "id" : "$1",
        "start": { "$lte": "$2" },
        "$or": [
            { "end": { "$exists": false } },
            { "end": null },
            { "end": { "$gte": "$2" } }
        ]}`,
    "$1", id,
    "$2", date,
)

Now filter looks more developer-friendly and you can copy and paste it between tools. We can cache compiled queries on client side so there will be almost no overhead.

Example of implementation:
https://gist.github.com/hummerd/0ef2991443e8ab064ef9bba168f97abb



 Comments   
Comment by hummerd N/A [ 28/Jun/22 ]

Hi Benji!
With bson.D it is  552 B/op 17 allocs/op ~4.4 times less memory
And also I think this problem is relevant to C# and all other drivers ( except lucky JS).

Comment by Benji Rewis (Inactive) [ 28/Jun/22 ]

The caching solution certainly has some benefits; I'm still of the opinion that we just want a better way to declare and build bson.D documents.

Using bson.M is also not good for performance

Ah interesting; I had not expected the M type to differ significantly in number of allocations as compared to the D type. Can I ask what memory usage you're seeing with D?

you can not copy paste it to Atlas/Compass/Mongosh

Fair point; sounds like we'll want to consider a JSON-like declaration syntax that can be either directly used or minimally transformed in these other applications.

I'm opening this ticket for now and we've marked it as related to GODRIVER-2271.

Comment by hummerd N/A [ 26/Jun/22 ]

You are right that there are another use cases where we need to build dynamic queries, and in this case we will have to recompile the query. I can not say how many of them are really have high cardinality. For low cardinality dynamic queries application sooner or later will end up using a compiled query. I also introduced parameters for queries to reduce amount of unique queries.  And we can make MRU cache that will cover most frequent cases.

Using bson.M is also not good for performance - because bson.M{"some": value} makes 2 allocations and  consumes 336 bytes (oh that's a lot ).  Example above uses 2472 B/op and 18 allocs/op (ouch!) if we use bson.M.

And still there is a problem with bson.M that you can not copy paste it to Atlas/Compass/Mongosh (for testing purpose) or to another language driver (for example C#).
While text queries is really interoperable apporach  .

And just wanted to mention that for me as for engineer that works with mongo drivers every day this is the most wanted feature.
 
Anyway, thanks for your time benji.rewis@mongodb.com!

Comment by Benji Rewis (Inactive) [ 22/Jun/22 ]

But I think it is not cool though

Makes sense to me haha. I understand the desire not to suppress lint warnings.

As for additional marshaling it only happens once (because of caching) during the application life time.

That certainly makes sense if you're building the same query over and over again. I am not convinced, however, that all users will have a use-case like that and may in fact be building highly variable queries throughout the lifetime of the application.

I think this is certainly an orthogonal improvement that we'll want to make, so I'll backlog the ticket for now as it will be non-trivial to design the new API and the team does not have a lot of availability at the moment. I'll also say that using bson.M where you can (it does not preserve order of keys in the map) could make long queries and pipelines a little cleaner. As in:

f := bson.M{
	"id":    "abc",
	"start": bson.M{"$lte": ts},
	"$or": bson.A{
		bson.M{"end": bson.M{"$exists": false}},
		bson.M{"end": nil},
		bson.M{"end": bson.M{"$gte": ts}},
	},
}

Comment by hummerd N/A [ 22/Jun/22 ]

Hi @Benji Rewis! Yep I can disable "field name" linters (but I think it is not cool though). But event then it does not make things much better. It is still long and hard to read. And the worth thing is that my frequent use case is to write query or pipeline, but than I need to modify it, because users always want new features. To do that I must turn this monster back in MQL query - test my modifications in mongo-compass/atlas/mongosh and then turn my new clauses/pipeline steps into bson.D. Yes, Golang code for pipeline can be generated from compass but not vise versa. String queries can be copied and pasted to and from mongo-compass/atlas/mongosh.

As for additional marshaling it only happens once (because of caching) during the application life time. Not even once per query, I mean once until restart. So for real life applications you will not feel the difference - we marshal and unmarshall a loooooot of documents per second. +1 document wont make the difference =).

Comment by Benji Rewis (Inactive) [ 22/Jun/22 ]

Hello hummer.dk@gmail.com, and thanks for your improvement suggestion!

Firstly, this is definitely an issue we’re aware of and one that’s been brought up before. Long documents and pipelines are hard to write and perhaps even harder to read in the Go driver. My only suggestions currently are to maybe use the bsoncore document builder syntax if that proves more readable (it’s also more performant, as marshaling is not required). Or, to stop keying the fields of your primitive.E values. As in:

f := bson.D{
    {"id", "abc"},
    {"start", bson.D{{"$lte", ts}}},
    {"$or", bson.A{
        bson.D{{"end", bson.D{{"$exists", false}}}},
        bson.D{{"end", nil}},
        bson.D{{"end", bson.D{{"$gte", ts}}}},
    }},
}

We don’t key the fields internally when we create bson.D objects. You may have to disable the “unkeyed fields” linter message in your IDEs, though.

A cleaner document builder would be an excellent improvement to the driver. I gather the main pain points now are the extra opening and closing brackets and the need to write bson.D or bson.A at the start of every object. Your implementation certainly looks more friendly, and the addition of caching is great. But to clarify, are you accepting extended JSON documents, unmarshaling them to bson.D instances, and then using the driver to marshal the bson.D instances to bytes that are passed over the wire? I would love a solution where we can avoid the extra unmarshaling from extended JSON to bson.D instances and still provide a cleaner API.

Comment by Esha Bhargava [ 17/Jun/22 ]

hummer.dk@gmail.com Thank you for reporting this issue. We'll look into it and get back to you soon.

Generated at Thu Feb 08 08:38:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.