[SERVER-38264] Dynamic data masking at query time Created: 27/Nov/18  Updated: 03/Jun/21  Resolved: 05/Dec/18

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Kevin Ha Assignee: Eric Sedor
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

OracleDB supports data redaction for data masking.

Apache Ranger supports data masking for Hive.

Is it possible to setup a filter function for a collection such that all query to this collection will have the returned records pass through this function for dynamic data masking ?

This is an important feature when we need to prepare the data source for exploration by data scientists, but we cannot show them the sensitive data.

Cloning the data for extra proprocessing for this purpose is too painful.



 Comments   
Comment by Paul Done [ 03/Jun/21 ]

Also covered now in the Practical MongoDB Aggregations book in the chapter "Mask Sensitive Fields" at: https://www.practical-mongodb-aggregations.com/examples/intricate-examples/mask-sensitive-fields.html

Comment by Paul Done [ 13/Feb/21 ]

In case it is useful,  more examples of Data Masking in MongoDB are here: https://pauldone.blogspot.com/2021/02/mongdb-data-masking.html and https://github.com/pkdone/mongo-data-masking

Comment by George Mihailov [ 22/Jul/20 ]

The new [$function|https://docs.mongodb.com/master/reference/operator/aggregation/function/] operator allows even more flexibility for this.

Comment by Eric Sedor [ 28/Nov/18 ]

Hi Kevin, thanks for your patience. This sort of security is supported by MongoDB in a few ways. The most straightforward might be creating a View using $project to omit sensitive fields.

Then, you can offer read permissions for this collection to your data scientist users without offering read permissions to the backing collection.

To explore this in more detail, we'd recommend the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

Comment by Kevin Ha [ 27/Nov/18 ]

Just got an idea:

 

MongoDB is opensourced.

If I know which source files in the source code are responsible for touching the data right below the query layer,

then maybe i can inject some code to check the required fields' path/hierarchy for a query and add my data masking code,

and then build my own customized version of MongoDB ?

 

If the above idea works then I can set up this customized MongoDB instance as a member of a production MongoDB replicaset,

and let the data scientists only query this MongoDB which has the data masking code applied.

 

 

 

Generated at Thu Feb 08 04:48:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.