[SERVER-22167] Failed to insert document larger than 256k Created: 13/Jan/16  Updated: 21/Aug/20  Resolved: 02/Feb/16

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 3.2.0
Fix Version/s: 3.2.3, 3.3.2

Type: Bug Priority: Critical - P2
Reporter: Ming Li Assignee: Martin Bligh
Resolution: Done Votes: 0
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File bigdoc.cpp    
Issue Links:
Duplicate
is duplicated by SERVER-22637 OP_INSERT can fail when inserted docu... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:

Use node.js MongoDB driver cannot reproduce the problem. It seems use bulkwrite works fine.

With POCO MongoDB connector to insert a document larger than 256k can reproduce the problem.

Sprint: Integration 10 (02/22/16)
Participants:

 Description   

Error:

Collection::insertDocument got document without _id for ns...

We are using POCO MongoDB connector to write/read MongoDB.
http://pocoproject.org/docs-1.6.0/Poco.MongoDB.html

It works fine 2.x, 3.0 and 3.1 version. But failed to insert large document above 256k from MongoDB 3.2 version.

The problem we found is because mongo does not generate index ID so the exception happened in Collection::insertDocuments:

for (auto it = begin; it != end; it++) {
        if (hasIdIndex && (*it)["_id"].eoo()) {
            return Status(ErrorCodes::InternalError,
                          str::stream() << "Collection::insertDocument got "
                                           "document without _id for ns:" << _ns.ns());
        }
 
        auto status = checkValidation(txn, *it);
        if (!status.isOK())
            return status;
    }

From debugging we found the insert process in insertMulti function go directly to insertMultiSingletons which use insertDocument without generating index ID. It seems not right. Only document larger than 256K has the problem because insertVectorMaxBytes is 256k.

I have attached a test file written by POCO C++ to replicate this problem.



 Comments   
Comment by Githook User [ 02/Feb/16 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-22167: Move fixDocuments up earlier in the insert path to ensure it runs

(cherry picked from commit 370634f46633f0fd6d626822d7d75752a34d4f50)
Branch: v3.2
https://github.com/mongodb/mongo/commit/3d236611718ccd164335c0edc649f34868d0072c

Comment by Githook User [ 02/Feb/16 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-22167: Move fixDocuments up earlier in the insert path to ensure it runs
Branch: master
https://github.com/mongodb/mongo/commit/370634f46633f0fd6d626822d7d75752a34d4f50

Comment by J Rassi [ 14/Jan/16 ]

I tried to insert an _id field to bypass the problem. But a bigger problem appears. It seems all the requests to mongodb are blocked somehow after insert the document this way. The _id inserted is an normal UUID, is this the problem?

No, that is not expected. The server should be able to handle an insert of a single 256k document without any issue. Are you able to open a shell and run db.currentOp() successfully?

Comment by Ming Li [ 14/Jan/16 ]

Hi All,

I tried to insert an _id field to bypass the problem. But a bigger problem appears. It seems all the requests to mongodb are blocked somehow after insert the document this way. The _id inserted is an normal UUID, is this the problem?

Thanks,
Ming

Comment by Ming Li [ 14/Jan/16 ]

Thanks Jason, looking forward the fix. I will try to insert _id manually at the moment.

Comment by J Rassi [ 13/Jan/16 ]

I am able to reproduce this issue against 3.2.0 with the following Python script.

import pymongo
client = pymongo.MongoClient()
client['test']['foo'].insert({'a': 'x'*256*1024}, manipulate=False, w=0)
print(client['test'].command({'getLastError': 1}))

rassi@rassi:~/work/mongo$ python repro.py
{u'code': 1, u'connectionId': 1, u'ok': 1.0, u'err': u'Collection::insertDocument got document without _id for ns:test.foo', u'n': 0}

This is a regression introduced in 3.1.9 by 6d3c42c8 (SERVER-19564). The issue is that the sanitization function fixDocumentForInsert() is never called when performing an OP_INSERT legacy insert of a single document whose size exceeds 256k. As a result, users are able to store invalid documents this way, and an insert of a document without an _id field will cause an exception to be thrown at the storage layer.

One reason that this issue is reproducible with some drivers but not others is that many drivers automatically add an _id field to the documents to be inserted before sending the insert off to the server. This will cause the _id check to pass at the storage layer, so the "got document without _id" exception is never thrown.

Comment by Kelsey Schubert [ 13/Jan/16 ]

Hi mli,

Thank you for the investigation. Since this behavior is not reproducible with other drivers, there does not appear to be a bug in the MongoDB server. I would recommend raising an issue on the POCO issue tracker.

Regards,
Thomas

Generated at Thu Feb 08 03:59:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.