[DRIVERS-2092] Drivers Spec : Specify object, collection, and database name validation rules Created: 15/Jun/16  Updated: 25/Jul/22  Resolved: 25/Jul/22

Status: Closed
Project: Drivers
Component/s: None
Fix Version/s: None

Type: Spec Change Priority: Minor - P4
Reporter: Rathi Gnanasekaran Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to JAVA-1644 NPE on createIndex when the database ... Closed
related to SERVER-26431 Validate the collection name for comm... Closed
is related to PYTHON-1708 Add helpers for validating collection... Closed
Driver Changes: Not Needed

 Description   

We need to clearly define what characters are allowed/disallowed in:

  • top-level keys in documents
  • keys in sub documents
  • database names
  • collection names

and make sure MongoDB drivers and shell enforce these rules.

For reference, here's the server documentation on naming restrictions for database and collection names: https://docs.mongodb.com/manual/reference/limits/#naming-restrictions



 Comments   
Comment by Jeremy Mikola [ 26/Jun/17 ]

rathi.gnanasekaran: Why is this linked to the CRUD spec component? IIRC, this began as a ticket to enforce names for collections and databases, which would affect driver methods for selecting those objects. Somewhere down the line, document validation was added to the issue description.

IMO, collection and database name validation would be a separate spec (it doesn't come up at all in the CRUD spec). Validating field names for documents can rightly fall under the CRUD spec.

I'd suggest splitting this into two issues. If you want to create a separate spec ticket for collection/database name validation, we can leave this categorized as CRUD spec and then modify the title/description to only refer to document field validation.

Comment by Jeffrey Yemin [ 28/Oct/16 ]

Lack of database name validation causes a couple of tough-to-diagnose issues.

A database name that's the empty string, combined with a collection name with a dot in it, like "a.b", will result in the namespace "a.b" This will be encoded as the namespace "a.b" in OP_QUERY, which the server will interpret as database of "a" and collection of "b". Not what the user intended.

Similarly, a database name with a dot in it, like "a.b", combined with a collection name like "c" will result in the namespace "a.b.c". This will be encoded as the namespace "a.b.c" in OP_QUERY, which the server will interpret as database of "a" and collection of "b.c". Also not what the user intended.

Comment by Christian Amor Kvalheim [ 17/Jun/16 ]

That sounds like what the node driver does now. However I read this ticket as requiring even stricter validation. If it does not then we should clarify it.

Comment by Bernie Hackett [ 17/Jun/16 ]

Python does the exact same key checking for replace, but skips checking keys for update (you can't check keys for update). Remove is irrelevant. Aggregation $out is a server issue, similar to mapReduce $out.

Comment by Christian Amor Kvalheim [ 17/Jun/16 ]

Yeah most drivers do this for insert documents on serialization today. However it's not as trivial on update, remove and aggregations.

Before adding additional latency to the drivers I think it's not unreasonable to ask for a pro/con assessment and to talk to Andreas.

Comment by Bernie Hackett [ 17/Jun/16 ]

It shouldn't be a huge performance hit. PyMongo has been checking keys for forever. When we encode a key to cstring we check if it starts with "$" or includes ".".

Comment by Christian Amor Kvalheim [ 17/Jun/16 ]

Couple of things

The server is holding based on perf evaluation. I do not think we should have any lower standard than them on it. Taking a huge performance hit is not really an acceptable situation for most of our users.

Now If we need to do this for security reasons we need to be told to do so by our security team. We also need to do a POC and bench to see the real cost of doing this.

Before mandating a drastic change like this I would like all pro/cons laid out so any decision is a fully informed decision.

I do think we should validate collections and database names however as those are close to zero cost. It's traversal that is expensive.

Comment by Bernie Hackett [ 17/Jun/16 ]

We have to enforce this at the driver level. Even if the server starts throwing errors, we still have w:0 to think about. That option isn't going away anytime soon. This is a painful situation to get yourself into.

Comment by David Golden [ 16/Jun/16 ]

[Copied from DRIVERS-308]:

Some of the cases I can think of:

  • Insert/replace documents – can't have dotted keys; can't have $ prefixed keys (except for documents with $ref + $id and optionally $db and other fields)
  • Query filters – can have dotted keys and maybe can have $ prefixed keys (OP_QUERY vs find command, or maybe searching on $ref/$id?)
  • Update documents – can have $ keys for update operators; can have dotted keys for fields within update operators; can't have $ keys for field names in a $set update
Generated at Thu Feb 08 08:24:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.