[CDRIVER-2018] Introduce bulk write options to disable BSON validation Created: 02/Feb/17  Updated: 03/May/17  Resolved: 23/Mar/17

Status: Closed
Project: C Driver
Component/s: libmongoc
Affects Version/s: 1.6.0
Fix Version/s: 1.7.0

Type: New Feature Priority: Major - P3
Reporter: Jeremy Mikola Assignee: Backlog - C Driver Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by PHPC-712 Driver should validate BSON documents... Closed
Related
is related to CDRIVER-2017 Inconsistent error reporting for inse... Closed
is related to PHPC-712 Driver should validate BSON documents... Closed

 Description   

When CDRIVER-1341 added logic to validate insert, replacement, and update documents (4b37e65), it subtly changed some conventions with bulk write execution. I was alerted to this through the PHP driver's manager-executeBulkWrite_error-005.phpt test. Validation errors are not reported until the bulk write is executed (here), which changes a previous assumption that execution would always ensure the server stream was initialized. The existing checks in mongoc_bulk_operation_execute() for valid client/database/collection objects could theoretically have returned within initializing a stream, but it wasn't a concern for us.

As it relates to the PHP driver, this means that we'd only be able to throw a BulkWriteException at execution time (from Manager::executeBulkWrite()) instead of an InvalidArgumentException from the offending method (e.g. BulkWrite::insert()). We actually discovered this because BulkWriteException attempts to refer to the server handling execution, and the selected server was null.

This may only be an issue for mongoc_bulk_operation_insert(), which may be at a disadvantage because it is void and cannot return a boolean indicating success or failure (unlike the update and replace "with_opts" functions). For this reason, I think we either need a new insert function that can return a boolean or some public API that can expose the mongoc_bulk_operation_t's error so that we can check it after each attempted insert. The update and replace functions have "with_opts" variants, which return a boolean indicating success or failure, as well as a bson_error_t * output argument to store any error encountered. A similar API for insert would be appreciated.



 Comments   
Comment by Githook User [ 23/Mar/17 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-2018 add mongoc_bulk_operation_insert_with_opts
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/8c0e06fe2059bed372d2d13b487e222125069952

Comment by Jeremy Mikola [ 13/Feb/17 ]

Summarizing our discussion earlier today:

mongoc_collection_insert() and mongoc_collection_insert_bulk() both support an undocumented MONGOC_INSERT_NO_VALIDATE flag, which disables all validation. mongoc_collection_insert() and MONGOC_INSERT_NO_VALIDATE is used internally by libmongoc for legacy index creation, but there is no evidence of the option being used with mongoc_collection_insert_bulk(). If a new option is implemented per this issue, we'd suggest a name related to legacy index creation rather than a generic name that might encourage users to use it for other situations.

The PHP driver accumulates all documents to insert in a mongoc_bulk_operation_t via the MongoDB\Driver\BulkWrite methods prior to the object being passed to MongoDB\Driver\Manager::executeBulkWrite() with a target namespace. There is no public API in libmongoc to access data that has been added to mongoc_bulk_operation_t, otherwise we might be able to check the namespace when executeBulkWrite() is called, pull insert documents out, and defer to mongoc_collection_insert() or mongoc_collection_insert_bulk() with MONGOC_INSERT_NO_VALIDATE.

We also noted that the introduction of document validation in libmongoc 1.6.0 (CDRIVER-1341) may already break behavior for PHP driver and library users today if they are using libbson and libmongoc as shared libraries.

I pinged derick to ask if he had any other ideas for working around this short of the API requested in this ticket.

Comment by Jeremy Mikola [ 08/Feb/17 ]

We discussed creating a bool mongoc_bulk_operation_insert_with_opts() function that also accepts const bson_t *opts and bson_error_t *error arguments. Rather than introduce an option for disabling "dot keys" validation, we discussed having a general "validateBson" option for all of the "with_opts" methods that allows libmongoc to skip its BSON validation introduced in CDRIVER-1341. That seems more justified if folks are passing pre-validated documents to the bulk write methods, as it allows them to avoid potentially expensive checking (e.g. UTF8).

Comment by Jeremy Mikola [ 03/Feb/17 ]

"insert" is your only API for insertions?

Correct. All writes are issued through the MongoDB\Driver\Manager::executeBulkWrite() method. The base driver has three basic methods for executing commands, queries, and bulk writes.

Can you use mongoc_collection_create_index_with_opts?

I'm aware of the logic this function and others have to fall back for older server versions, but that is something we handle ourselves in PHPLIB after selecting a server is checking it's min/max wire versions. This was all done purposefully by design to keep our C layer as slim as possible.

The only way I can imagine we'd be able to use mongoc_collection_create_index_with_opts is if, at the time a bulk write was executed, we checked if the target collection was "system.indexes" on a pre-2.6 server and then used some API, which does not yet exist, to dissect it and obtain the raw insert statements. We'd then have to throw exceptions if the bulk write consisted of anything more than a series of insert statements. That all seems quite a bit more hacky and fragile than a mechanism to bypass the "dot" validation.

Comment by A. Jesse Jiryu Davis [ 03/Feb/17 ]

"insert" is your only API for insertions? Can you use mongoc_collection_create_index_with_opts? That function tries the modern "createIndexes" command and falls back to inserting into the system.indexes collection with MONGOC_INSERT_NO_VALIDATE set. The driver does all the work for you there.

Comment by Jeremy Mikola [ 02/Feb/17 ]

jesse: See my comment in PHPC-712:

Note: libmongoc uses a MONGOC_INSERT_NO_VALIDATE constant to disable validation for insert documents. This appears to only be used for legacy index creation.

If we add validation for BulkWrite::insert() object, we should consider either supporting such an option or inspecting the input argument and skipping validation if it looks like a legacy index specification.

With libmongoc 1.6.0, there is no way for PHPLIB to create indexes on MongoDB 2.4 and earlier, since MongoDB\BulkWrite::insert() is our only API for insertions. I would propose a mongoc_bulk_operation_insert_with_opts() function that accepts an option to exclude BSON_VALIDATE_DOT_KEYS from the validation flags. In effect, this would force the server to validate that aspect of the document.

Generated at Wed Feb 07 21:13:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.