[SERVER-50092] [FLE] with encryption on collection and $lookup with two non-encrypted collections fails Created: 04/Aug/20  Updated: 31/Jan/24

Status: Backlog
Project: Core Server
Component/s: Field Level Encryption
Affects Version/s: 4.2.8
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 23
Labels: FLE, qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-68371 Enabling CSFLE in your MongoClient ca... Closed
is related to JAVA-3859 aggregation with lookup fails with 2 ... Closed
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

if you perform an aggregation pipeline $lookup from a MongoClient with encryption, where both of collections involved are not encrypted you'll get Pipeline over an encrypted collection cannot reference additional collections 

For example, encrypted collection A, non-encrypted collection B and C and you have pipeline from B : [\{$lookup:{from:"C", localField:"Foo", foreignField:"Bar", as:"Baz"}}] you still get the error above.



 Comments   
Comment by Sujan Maharjan [ 31/Jan/24 ]

Any updates so far or any alternative solutions?  This is really a BIG issue. 

Comment by Angus Ryer [ 15/Nov/23 ]

This has been an issue for such a long time. Is everyone that uses CSFLE simply tolerating managing two connections? Have they all moved to some other database provider or encryption mechanism? Or is everyone just encrypting fields explicitly?

To me this seems like a bug, not an enhancement. Why should querying non-encrypted collections result in an error when CSFLE is enabled? No encryption mechanisms should be engaged after the query is parsed, since no encrypted fields would be found. 

Comment by Ben Luk [ 25/Oct/22 ]

Hello, any update on this?

Comment by Anthony LaMartina [ 09/Mar/22 ]

I figured out a work around. Essentially I created a collection that is dedicated to encrypting single string via CSFLE that I want to use as a matching comparison in a Lookup. I insert the string with an auto-encryption client into the collection and then query it with a non-encryption client to get it in its encrypted state (Bson Binary). I then can join a collection in a Lookup that has encrypted strings with another collection that does not, by using a non-encrypted client for both. Then if I want to query the Lookup by an encrypted string I just need to match the Bson Binary data I created in the first collection mentioned above to the encrypted field in the collection that is in the lookup. This way I don't need the framework to compare the decrypted string with a search string. I just compare an encrypted string field with another temporarily encrypted string.  

Comment by Anthony LaMartina [ 07/Mar/22 ]

Are there any plans to release a fix for this in the near future, or is this backlogged indefinitely?

I needed to convert over various fields in my database to be encrypted in order to be compliant with Facebook's Platform data policies. I went the CSFLE route only to find out in the end that I get this bug on my existing queries that are using lookups/joins because it appears that collections being accessed through an encryption client can not do this. So know I need to perform a refactor on my code to perform any encrypted collections separately from lookups.

I am pretty disappointed to see that this was brought up two years ago and hasn't been fixed. Has anyone figured out a better alternative to MongoDB's CSFLE that plays nice with server side mongo queries (including joins)?

Comment by Tyler Elkink [ 16/Dec/20 ]

I can understand that from the perspective of a successful NOSQL development team, joins being a "convenience" seems a reasonable position. I can understand that smoothing out gaps between joins and encryption is a high-effort, low-reward option.

The problem from the user perspective is, I think, that MongoDB wrote $lookup, made it available in early 2018, and is trying to establish itself in an environment thoroughly dominated by SQL systems that use joins- and developers that are used to joins. And use joins, as they presume they are able to do so from the MongoDB documentation. If $lookup is a "convenience" that's not part of the core MongoDB offering, then so- as 3.2 releases- are basic math operations on aggregations. My company just ran head-first into this problem while trying to protect PII on tables we reference in aggregation.

We now get to decide whether to rip $lookup out of all of our code and refactor, or to substantially increase the number of connections to a production environment (which, due to MongoDB's architectural decisions, are more limited than other DBs), or to hope this bug comes out of backlog and into active development soon. Like Tom says, handling connection logic based on field encryption is a huge pain, and like Tal says there's no good solution with this bug in place. I'm going to recommend we refactor, since developer time is more under our control and less firmly limited than the other options, but I hope this description illuminates to the devs why MongoDB users may have a perspective that doesn't include joins (and joins working with encryption) as a "convenience."

Comment by Tal Bar [ 24/Nov/20 ]

Hello, any update on this?

If this issue isn't going to be resolved in the near future, we are going to have to stop using CSFLE altogether. 
The workaround of using two separate db connections (one with CSFLE and another without) is causing us trouble, so we have to stop using it, and get back to normal workflow. 

I have to say I'm surprised this issue isn't given a much higher priority.. it is a very serious bug IMHO. 

Comment by Anu Madan [ 20/Oct/20 ]

Hi @Asya,

Yes, I understand it is complex if both collections require encryption.
We are looking for the solution when both collections do not require encryption. This is needed to have no impact on the existing applications/implementation if we introduce CSFLE on some collections and use the same connection pool (with encryption configuration) to work on both set of encrypted and non-encrypted collections in a database.
Hope this gives some insight on why this feature is important.

Thanks,
Anu

Comment by Tom Gabay [ 16/Oct/20 ]

Hey @asya,
The issue is about the case of both* collections are not encrypted.

Comment by Asya Kamsky [ 16/Oct/20 ]

anu.madan@idemia.com this is not a simple problem to address as a second collection requires knowing a different schema for encryption.

$lookup is not really part of "basic" functionality in normal MongoDB querying - it's more a convenience added to allow you to avoid a second query in certain cases. I would say that if this is a blocker you should consider a different approach to querying the two collections (for instance performing two queries rather than doing a single aggregation with $lookup).

Comment by Anu Madan [ 15/Oct/20 ]

Hi,

Facing the same issue.
This one is a blocker for our applications as well. Identifying which connection to use just when the query requires aggregation and then switch to a non-encrypted client is a big performance impact. 'lookup' is a very basic functionality that needs to be supported and will impact almost all mongodb customers who plan to choose CSFLE.

Is there an expected date for the resolution of this issue?

Thanks.

Comment by Tom Gabay [ 05/Aug/20 ]

Hey, Also handling this issue,

Just wanted to say i think its a big issue,

in every real life scenario doing CSFLE require us to handle some complex logic to decide what connection to use, 
or use non encrypted connection on error

very un-natural way to use db connection

Comment by Asya Kamsky [ 04/Aug/20 ]

This is a current limitation where using encrypted client assumes that it's necessary and doesn't allow $lookup into another collection.

Current workaround would be to use a non-encrypted client for such aggregations.

Generated at Thu Feb 08 05:21:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.