Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22167

Failed to insert document larger than 256k

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 3.2.0
    • Fix Version/s: 3.2.3, 3.3.2
    • Component/s: Storage
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Steps To Reproduce:
      Hide

      Use node.js MongoDB driver cannot reproduce the problem. It seems use bulkwrite works fine.

      With POCO MongoDB connector to insert a document larger than 256k can reproduce the problem.

      Show
      Use node.js MongoDB driver cannot reproduce the problem. It seems use bulkwrite works fine. With POCO MongoDB connector to insert a document larger than 256k can reproduce the problem.
    • Sprint:
      Integration 10 (02/22/16)

      Description

      Error:

      Collection::insertDocument got document without _id for ns...

      We are using POCO MongoDB connector to write/read MongoDB.
      http://pocoproject.org/docs-1.6.0/Poco.MongoDB.html

      It works fine 2.x, 3.0 and 3.1 version. But failed to insert large document above 256k from MongoDB 3.2 version.

      The problem we found is because mongo does not generate index ID so the exception happened in Collection::insertDocuments:

      for (auto it = begin; it != end; it++) {
              if (hasIdIndex && (*it)["_id"].eoo()) {
                  return Status(ErrorCodes::InternalError,
                                str::stream() << "Collection::insertDocument got "
                                                 "document without _id for ns:" << _ns.ns());
              }
       
              auto status = checkValidation(txn, *it);
              if (!status.isOK())
                  return status;
          }
      

      From debugging we found the insert process in insertMulti function go directly to insertMultiSingletons which use insertDocument without generating index ID. It seems not right. Only document larger than 256K has the problem because insertVectorMaxBytes is 256k.

      I have attached a test file written by POCO C++ to replicate this problem.

      1. bigdoc.cpp
        1 kB
        Ming Li

        Issue Links

          Activity

          Hide
          thomas.schubert Thomas Schubert added a comment -

          Hi Ming Li,

          Thank you for the investigation. Since this behavior is not reproducible with other drivers, there does not appear to be a bug in the MongoDB server. I would recommend raising an issue on the POCO issue tracker.

          Regards,
          Thomas

          Show
          thomas.schubert Thomas Schubert added a comment - Hi Ming Li , Thank you for the investigation. Since this behavior is not reproducible with other drivers, there does not appear to be a bug in the MongoDB server. I would recommend raising an issue on the POCO issue tracker . Regards, Thomas
          Hide
          rassi J Rassi added a comment -

          I am able to reproduce this issue against 3.2.0 with the following Python script.

          import pymongo
          client = pymongo.MongoClient()
          client['test']['foo'].insert({'a': 'x'*256*1024}, manipulate=False, w=0)
          print(client['test'].command({'getLastError': 1}))
          

          rassi@rassi:~/work/mongo$ python repro.py
          {u'code': 1, u'connectionId': 1, u'ok': 1.0, u'err': u'Collection::insertDocument got document without _id for ns:test.foo', u'n': 0}
          

          This is a regression introduced in 3.1.9 by 6d3c42c8 (SERVER-19564). The issue is that the sanitization function fixDocumentForInsert() is never called when performing an OP_INSERT legacy insert of a single document whose size exceeds 256k. As a result, users are able to store invalid documents this way, and an insert of a document without an _id field will cause an exception to be thrown at the storage layer.

          One reason that this issue is reproducible with some drivers but not others is that many drivers automatically add an _id field to the documents to be inserted before sending the insert off to the server. This will cause the _id check to pass at the storage layer, so the "got document without _id" exception is never thrown.

          Show
          rassi J Rassi added a comment - I am able to reproduce this issue against 3.2.0 with the following Python script. import pymongo client = pymongo.MongoClient() client[ 'test' ][ 'foo' ].insert({ 'a' : 'x' * 256 * 1024 }, manipulate = False , w = 0 ) print (client[ 'test' ].command({ 'getLastError' : 1 })) rassi@rassi:~/work/mongo$ python repro.py {u'code': 1, u'connectionId': 1, u'ok': 1.0, u'err': u'Collection::insertDocument got document without _id for ns:test.foo', u'n': 0} This is a regression introduced in 3.1.9 by 6d3c42c8 ( SERVER-19564 ). The issue is that the sanitization function fixDocumentForInsert() is never called when performing an OP_INSERT legacy insert of a single document whose size exceeds 256k. As a result, users are able to store invalid documents this way, and an insert of a document without an _id field will cause an exception to be thrown at the storage layer. One reason that this issue is reproducible with some drivers but not others is that many drivers automatically add an _id field to the documents to be inserted before sending the insert off to the server. This will cause the _id check to pass at the storage layer, so the "got document without _id" exception is never thrown.
          Hide
          mli Ming Li added a comment -

          Thanks Jason, looking forward the fix. I will try to insert _id manually at the moment.

          Show
          mli Ming Li added a comment - Thanks Jason, looking forward the fix. I will try to insert _id manually at the moment.
          Hide
          mli Ming Li added a comment -

          Hi All,

          I tried to insert an _id field to bypass the problem. But a bigger problem appears. It seems all the requests to mongodb are blocked somehow after insert the document this way. The _id inserted is an normal UUID, is this the problem?

          Thanks,
          Ming

          Show
          mli Ming Li added a comment - Hi All, I tried to insert an _id field to bypass the problem. But a bigger problem appears. It seems all the requests to mongodb are blocked somehow after insert the document this way. The _id inserted is an normal UUID, is this the problem? Thanks, Ming
          Hide
          rassi J Rassi added a comment -

          I tried to insert an _id field to bypass the problem. But a bigger problem appears. It seems all the requests to mongodb are blocked somehow after insert the document this way. The _id inserted is an normal UUID, is this the problem?

          No, that is not expected. The server should be able to handle an insert of a single 256k document without any issue. Are you able to open a shell and run db.currentOp() successfully?

          Show
          rassi J Rassi added a comment - I tried to insert an _id field to bypass the problem. But a bigger problem appears. It seems all the requests to mongodb are blocked somehow after insert the document this way. The _id inserted is an normal UUID, is this the problem? No, that is not expected. The server should be able to handle an insert of a single 256k document without any issue. Are you able to open a shell and run db.currentOp() successfully?
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

          Message: SERVER-22167: Move fixDocuments up earlier in the insert path to ensure it runs
          Branch: master
          https://github.com/mongodb/mongo/commit/370634f46633f0fd6d626822d7d75752a34d4f50

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'} Message: SERVER-22167 : Move fixDocuments up earlier in the insert path to ensure it runs Branch: master https://github.com/mongodb/mongo/commit/370634f46633f0fd6d626822d7d75752a34d4f50
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

          Message: SERVER-22167: Move fixDocuments up earlier in the insert path to ensure it runs

          (cherry picked from commit 370634f46633f0fd6d626822d7d75752a34d4f50)
          Branch: v3.2
          https://github.com/mongodb/mongo/commit/3d236611718ccd164335c0edc649f34868d0072c

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'} Message: SERVER-22167 : Move fixDocuments up earlier in the insert path to ensure it runs (cherry picked from commit 370634f46633f0fd6d626822d7d75752a34d4f50) Branch: v3.2 https://github.com/mongodb/mongo/commit/3d236611718ccd164335c0edc649f34868d0072c

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                  Agile