-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Unknown
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
-
Python Drivers
-
None
-
None
-
None
-
None
-
None
-
None
_DocumentType is defined with the TypeVar binding specified as Mapping[str, Any], which seems to confuse interface inheritance with class inheritance in some of pymongo methods, such as insert_one, which may modify documents bound to Mapping, which in turn contradicts the mapping protocol that cannot modify the underlying object.
More specifically, a derived interface is supposed to add new methods, such as MutableMapping adds methods to modify the underlying object. However objects implementing these interfaces will not implement both because a plain mapping may use read-only storage, such as an array, so a mutable mapping could not even be implemented on top of a mapping (this is an example to illustrate the incorrect use of Mapping in bound=, not a specific implementation of either of these mappings).
What pymongo does, effectively, with the incoming documents, is that it is tricking static typing into ignoring this modification, which happens in the method Collection._insert_one that puts the incoming documen (doc) into command here:
command = {"insert": self.name, "ordered": ordered, "documents": [doc]}
This allows pymongo to escape the Mapping protocol and treat this document as a plain dictionary and modify it.
The side effect of this in the client code is that documents are magically modified within such calls, so in order to account for this, I have to use MutableMapping in the code wrapping pymongo, but this prevents instances of TypedDict from working against pymongo because typed dictionaries cannot be assigned to MutableMapping variables, because the latter may add/remove fields.
I will also note that TypedDict is the perfect vehicle for Mongo DB documents because they follow the same logic for existing/non-existing fields as is done in the database, unlike data classes, which track non-existing fields with None values and end up as null's in the database.
What pymongo should do instead is it should define _DocumentType as a typed dictionary, like this:
class _DocumentType(TypedDict):
_id: NotRequired[Any]
, which can be further improved by making it a generic type, so the _id type can vary.
Client code using pymongo won't have to derive from this class, which can remain private, and will define its own document classes because typed dictionaries just define the document shape, similarly to how TypeScript does it, so client code will define their own document classes with or without _id.
Python's static typing is not as advanced as TypeScript typing, so compared to the Node.js driver, it may require defining two of these, one with an optional _id and one with a mandatory one. The TypeScript driver solves it via a type that treats _id optional unless it is required, which sounds like it can be handled via a union of with/without _id in Python.
For methods like insert_one this clearly would show that _id may be added by the method. Not sure how this will affect ad-hoc client code that doesn't care about JSON schema or static typing though.
Definition of done: what must be done to consider the task complete?
When this is working, pymongo will clearly show methods that may add the _id field and those that will not. As the static typing in Python evolves, it may be able to extend this into tying filters, etc, into allowed document fields.
The exact Python version used, with patch level:
$ python -c "import sys; print(sys.version)"
3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]
The exact version of PyMongo used, with patch level:
$ python -c "import pymongo; print(pymongo.version); print(pymongo.has_c())"
4.9.2
True
Describe how MongoDB is set up. Local vs Hosted, version, topology, load balanced, etc.
This is not relevant for this issue.
The operating system and version (e.g. Windows 7, OSX 10.8, ...)
Windows 10, 11 and Ubuntu 22.04
Web framework or asynchronous network library used, if any, with version (e.g. Django 1.7, mod_wsgi 4.3.0, gevent 1.0.1, Tornado 4.0.2, ...)
This is not relevant for this issue.
Security Vulnerabilities
If you’ve identified a security vulnerability in a driver or any other MongoDB project, please report it according to the instructions here
- related to
-
PYTHON-5257 Type hints for "let" are missing generic param for Mapping
-
- Backlog
-