Priority: Major - P3
Affects Version/s: None
Fix Version/s: 0.0.4
Epic Name:BSON usability
Detailed Project Statuses:
2018-08-14 : Target date 2018-09-11 (4weeks)
2018-08-28 : Updated target date 2018-09-28 (6weeks)
Added two weeks as Jeremy is still OOO and lots of code reviews blocked. Also, Kaitlin is in Berlin for a week.
2018-09-25 : Work stalled. Will split the Epic in half, all the unresolved issues will be moved to a followup work epic.2018-08-14 : Target date 2018-09-11 (4weeks) 2018-08-28 : Updated target date 2018-09-28 (6weeks) Added two weeks as Jeremy is still OOO and lots of code reviews blocked. Also, Kaitlin is in Berlin for a week. 2018-09-25 : Work stalled. Will split the Epic in half, all the unresolved issues will be moved to a followup work epic.
The Document class currently tracks it state through a single bson_t (wrapped by a DocumentStorage instance). This may be problematic for several reasons.
When a Document is initialized from Swift types, we can reasonably assume that its bson_t is valid. For instance, we should never encounter an unknown type (see:
SWIFT-132). However, when a Document is initialized from a bson_t, we cannot assume that it is valid. That is a scenario where we can absolutely expect invalid data or unknown types. Since Document only decodes BSON data lazily as keys are accessed, we could (a) easily ignore invalid BSON data and (b) find ourselves unable to throw an exception at access time (for reasons cited in SWIFT-132).
One way to mitigate this may be to call bson_validate_with_error() when initializing a Document from a bson_t; however, this may be expensive as it entails recursively crawling the document (albeit with fewer visitors).
Document instances are mutated via the subscript setter. When Documents are assigned, the private copyStorageIfRequired() function is used to guarantee copy-on-write semantics. When a BsonValue is assigned to a Document subscript, it is always appended to the underlying bson_t. This can easily lead to duplicate keys being created within the same BSON document. Consider the following example:
Since Document.subscript.get returns a BsonValue?, it is not possible chain subscript operators (e.g. doc["a"]["b"]). This makes it less likely for users to be confused by this behavior, provided we clearly document that Document.subscript.set always appends and may create duplicate keys.
Users often expect BSON documents to function like dictionaries. In the Swift driver, the Document class does not implement any protocols except those that allow it to be initialized from a dictionary or array literal. We should see if there is value in implementing the Collection protocol or others.
Kaitlin Mahar explained that the current design of Document leads to certain concessions. For example, Dictionary key lookups are O(1) while bson_t requires O(n).
While bson_t may be suitable for a C API, it may prove too limiting for a high-level language like Swift. We should compare the current BSON implementation to other typed languages such as Java, which I expect maintain some intermediary representation of BSON. If we adopted that in Swift, Documents may very well be presented by Dictionaries and only converted to bson_t at encode time.