Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-26182

3.2 node syncing from a 3.0 node can crash due to too-large BSON during upconversion to find command reply

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Upgrade member to 3.2

      Show
      Upgrade member to 3.2
    • Sprint:
      Query 2016-10-10, Query 2016-11-21

      Description

      When a 3.2 replica set member is syncing from a 3.0 node, it downconverts find command requests to legacy OP_QUERY find requests and upconverts legacy OP_QUERY find replies to find command replies. This upconversion process can create a document that is larger than the 16 MB maximum, causing the secondary to trip a fatal assertion. See comment below for more details.

      This issue should only affect users whose data files contain documents 12MB or larger.

      A backtrace such as the following should appear in the logs when this issue occurs:

      2016-09-19T13:07:36.142+0000 I -        [NetworkInterfaceASIO-BGSync-0] Assertion: 10334:BSONObj size: 17130437 (0x10563C5) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 99445122894
      2016-09-19T13:07:36.154+0000 I CONTROL  [NetworkInterfaceASIO-BGSync-0]
       0x130c4a2 0x12a7248 0x128efe8 0x128f09c 0x973949 0xda2484 0x10abf65 0x10b75fb 0x10b929c 0x10b98a8 0x10b7040 0x108a750 0x10994ac 0x1099968 0x1329e71 0x132a091 0x132e22f 0x10aeda5 0x1b431c0 0x7f881c697b50 0x7f881c3e1fbd
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"F0C4A2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EA7248","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"E8EFE8","s":"_ZN5mongo11msgassertedEiPKc"},{"b":"400000","o":"E8F09C"},{"b":"400000","o":"573949","s":"_ZNK5mongo7BSONObj14_assertInvalidEv"},{"b":"400000","o":"9A2484","s":"_ZN5mongo27appendGetMoreResponseObjectExNS_10StringDataENS_9BSONArrayEPNS_14BSONObjBuilderE"},{"b":"400000","o":"CABF65","s":"_ZN5mongo8executor30upconvertLegacyGetMoreResponseEjNS_10StringDataERKNS_7MessageE"},{"b":"400000","o":"CB75FB","s":"_ZN5mongo8executor20NetworkInterfaceASIO12AsyncCommand8responseENS_3rpc8ProtocolENS_6Date_tEPNS3_18EgressMetadataHookE"},{"b":"400000","o":"CB929C","s":"_ZN5mongo8executor20NetworkInterfaceASIO20_completedOpCallbackEPNS1_7AsyncOpE"},{"b":"400000","o":"CB98A8"},{"b":"400000","o":"CB7040"},{"b":"400000","o":"C8A750","s":"_ZN4asio6detail14strand_service8dispatchINS0_7binder2IRSt8functionIFvSt10error_codemEES5_mEEEEvRPNS1_11strand_implERT_"},{"b":"400000","o":"C994AC","s":"_ZN4asio6detail14strand_service8dispatchINS0_17rewrapped_handlerINS0_7binder2INS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS8_EEEENS_17mutable_buffers_1ENS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEESI_mEESK_EEEEvRPNS1_11strand_implERT_"},{"b":"400000","o":"C99968","s":"_ZN4asio6detail23reactive_socket_recv_opINS_17mutable_buffers_1ENS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS6_EEEES2_NS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEEE11do_completeEPvPNS0_19scheduler_operationERKSF_m"},{"b":"400000","o":"F29E71","s":"_ZN4asio6detail9scheduler10do_run_oneERNS0_11scoped_lockINS0_11posix_mutexEEERNS0_21scheduler_thread_infoERKSt10error_code"},{"b":"400000","o":"F2A091","s":"_ZN4asio6detail9scheduler3runERSt10error_code"},{"b":"400000","o":"F2E22F","s":"_ZN4asio10io_service3runEv"},{"b":"400000","o":"CAEDA5"},{"b":"400000","o":"17431C0","s":"execute_native_thread_routine"},{"b":"7F881C691000","o":"6B50"},{"b":"7F881C304000","o":"DDFBD","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.9", "gitVersion" : "22ec9e93b40c85fc7cae7d56e7d6a02fd811088c", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.2.0-4-amd64", "version" : "#1 SMP Debian 3.2.73-2+deb7u2", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "9A35851B156CCAF69D719418740E962817B11EA7" }, { "b" : "7FFCBB8BF000", "elfType" : 3, "buildId" : "000E93FBF194E9B9370E69572D55070F81BD1815" }, { "b" : "7F881D54A000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "16DA62C831D8080E3BCB9A1E7738D399EA3A89C0" }, { "b" : "7F881D151000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "079D784F9F4A88B6D87FC3E6B0B07BFBE209540E" }, { "b" : "7F881CF49000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "07CC333722DC4237B5F88E47A38C214FDA0B9B59" }, { "b" : "7F881CD45000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "028546D6F23D0FD259928FED6EFE53E64782EF2D" }, { "b" : "7F881CAC3000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "2C5446FA41E784A1C6FDD528AB2BCC7BCA9EF06C" }, { "b" : "7F881C8AD000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "F980B1188708F8D8B5C35D185444AF4CB939AA1E" }, { "b" : "7F881C691000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "3F3BFFEDAF77C955862785C6CBA7670F82A05563" }, { "b" : "7F881C304000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "50B9FD83D874F18E3B385E0349FBD5D1281A19E3" }, { "b" : "7F881D7A9000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "37BC6FC6FF3FA04B0A3EC20DFF215AB55EA5F5CB" }, { "b" : "7F881C0ED000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "1EFEB71FD4999C2307570D673A724EA4E1D85267" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x130c4a2]
       mongod(_ZN5mongo10logContextEPKc+0x138) [0x12a7248]
       mongod(_ZN5mongo11msgassertedEiPKc+0x88) [0x128efe8]
       mongod(+0xE8F09C) [0x128f09c]
       mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3B9) [0x973949]
       mongod(_ZN5mongo27appendGetMoreResponseObjectExNS_10StringDataENS_9BSONArrayEPNS_14BSONObjBuilderE+0x514) [0xda2484]
       mongod(_ZN5mongo8executor30upconvertLegacyGetMoreResponseEjNS_10StringDataERKNS_7MessageE+0x145) [0x10abf65]
       mongod(_ZN5mongo8executor20NetworkInterfaceASIO12AsyncCommand8responseENS_3rpc8ProtocolENS_6Date_tEPNS3_18EgressMetadataHookE+0x18B) [0x10b75fb]
       mongod(_ZN5mongo8executor20NetworkInterfaceASIO20_completedOpCallbackEPNS1_7AsyncOpE+0x5C) [0x10b929c]
       mongod(+0xCB98A8) [0x10b98a8]
       mongod(+0xCB7040) [0x10b7040]
       mongod(_ZN4asio6detail14strand_service8dispatchINS0_7binder2IRSt8functionIFvSt10error_codemEES5_mEEEEvRPNS1_11strand_implERT_+0x70) [0x108a750]
       mongod(_ZN4asio6detail14strand_service8dispatchINS0_17rewrapped_handlerINS0_7binder2INS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS8_EEEENS_17mutable_buffers_1ENS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEESI_mEESK_EEEEvRPNS1_11strand_implERT_+0x89C) [0x10994ac]
       mongod(_ZN4asio6detail23reactive_socket_recv_opINS_17mutable_buffers_1ENS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS6_EEEES2_NS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEEE11do_completeEPvPNS0_19scheduler_operationERKSF_m+0x228) [0x1099968]
       mongod(_ZN4asio6detail9scheduler10do_run_oneERNS0_11scoped_lockINS0_11posix_mutexEEERNS0_21scheduler_thread_infoERKSt10error_code+0x2F1) [0x1329e71]
       mongod(_ZN4asio6detail9scheduler3runERSt10error_code+0xC1) [0x132a091]
       mongod(_ZN4asio10io_service3runEv+0x2F) [0x132e22f]
       mongod(+0xCAEDA5) [0x10aeda5]
       mongod(execute_native_thread_routine+0x20) [0x1b431c0]
       libpthread.so.0(+0x6B50) [0x7f881c697b50]
       libc.so.6(clone+0x6D) [0x7f881c3e1fbd]
      -----  END BACKTRACE  -----
      

      Original description

      We've recently started upgrading our Mongo replica sets to 3.2,
      We have one replica set for which it doesn't apply as swiftly as the others though,
      When trying upon its first secondary node, starting it up after binaries were switched to 3.2 appears to first successfully mark it as part of its configured replica set and set it to a SECONDARY state, but immediately after change its state to a STANDALONE instance while the enclosed errors are printed on the instance log:

      2016-09-19T13:07:35.977+0000 I NETWORK [initandlisten] connection accepted from 10.14.0.51:45381 #140 (136 connections now open)
      2016-09-19T13:07:36.142+0000 I - [NetworkInterfaceASIO-BGSync-0] Assertion: 10334:BSONObj size: 17130437 (0x10563C5) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 99445122894

      This appears to have to do with a specific collection that at least one document that is bigger than 16MB, thing is we've issued db.upgradeCheckAllDBs() prior to the upgrade for this member and it completed perfectly fine.

      How can we spot the root cause for this and eliminate it please so we can move on with the 3.2 upgrade for this replica set as well as all of our other replica sets?

      Many thanks,
      Avraham Kalvo, DBA

      1. mongodb_appbuilder.log_exceprt
        306 kB
        Avraham Kalvo

        Activity

        Hide
        avrahamk Avraham Kalvo added a comment - - edited

        We are discussing upgrade to version 3.2.9 from probably version 3.0.9 (we didn't notice its exact minor version before having setup the 3.2.9 binaries)
        We don't have it upgrading from 3.0.12 for another member in the same replica set.

        This happens over Debian GNU/Linux 7

        Thanks,
        Avi K

        Show
        avrahamk Avraham Kalvo added a comment - - edited We are discussing upgrade to version 3.2.9 from probably version 3.0.9 (we didn't notice its exact minor version before having setup the 3.2.9 binaries) We don't have it upgrading from 3.0.12 for another member in the same replica set. This happens over Debian GNU/Linux 7 Thanks, Avi K
        Hide
        thomas.schubert Thomas Schubert added a comment -

        Hi Avraham Kalvo,

        Thanks for reporting this issue. We are investigating it and will update you when we know more.

        Kind regards,
        Thomas

        Show
        thomas.schubert Thomas Schubert added a comment - Hi Avraham Kalvo , Thanks for reporting this issue. We are investigating it and will update you when we know more. Kind regards, Thomas
        Hide
        david.storch David Storch added a comment -

        I reviewed this with Thomas Schubert, and we came up with a hypothesis of what we think is happening. Thomas will develop a repro script to confirm.

        In 3.2, we introduced a new protocol for performing queries called find command. The syntax for the find command is documented here: https://docs.mongodb.com/manual/reference/command/find/#dbcmd.find. This replaces an older protocol typically referred to as OP_QUERY legacy find. When a version 3.2 member of a replica set syncs from another 3.2, it uses the new find command protocol for fetching oplog entries from the sync source. If the sync source is 3.0, however, then it must use legacy OP_QUERY find for compatibility. In order to deal with this, the replication system downconverts find command requests to OP_QUERY find and then upconverts OP_QUERY find replies to find command replies. The upconversion happens here:

        https://github.com/mongodb/mongo/blob/r3.2.9/src/mongo/executor/downconvert_find_and_getmore_commands.cpp#L207-L229

        Not all OP_QUERY find replies, however, can be represented as find command replies! The find command reply format allows for at most 16MB of user data. The OP_QUERY reply format, however, can theoretically contain 20MB of user data. This rarely happens in practice, however, since the batching rules in 3.0 are to stop adding documents when the size of the reply buffer exceeds 4MB. But imagine the following problematic scenario:

        • The first result document is just under 4MB, which is added to the OP_QUERY reply buffer.
        • The query system proceeds to fetch another document, since the 4MB threshold has not yet been crossed.
        • The next document is just under 16MB, the maximum document size, and is added to the reply buffer.
        • Now the response has 20MB of user data, which cannot be represented as a find command reply!

        The find command itself is not affected by a similar problem; when it retrieves a document from the query system that will put the batch size over 16MB, it queues the document for later and returns the batch without it. One possible fix, assuming that we have correctly identified the problem, would be to backport code to 3.0 that adds similar behavior to the legacy OP_QUERY find path. Another approach would be to put a fix into the 3.2 branch to make the replication upconversion avoid creating a too-large document.

        Show
        david.storch David Storch added a comment - I reviewed this with Thomas Schubert , and we came up with a hypothesis of what we think is happening. Thomas will develop a repro script to confirm. In 3.2, we introduced a new protocol for performing queries called find command. The syntax for the find command is documented here: https://docs.mongodb.com/manual/reference/command/find/#dbcmd.find . This replaces an older protocol typically referred to as OP_QUERY legacy find. When a version 3.2 member of a replica set syncs from another 3.2, it uses the new find command protocol for fetching oplog entries from the sync source. If the sync source is 3.0, however, then it must use legacy OP_QUERY find for compatibility. In order to deal with this, the replication system downconverts find command requests to OP_QUERY find and then upconverts OP_QUERY find replies to find command replies. The upconversion happens here: https://github.com/mongodb/mongo/blob/r3.2.9/src/mongo/executor/downconvert_find_and_getmore_commands.cpp#L207-L229 Not all OP_QUERY find replies, however, can be represented as find command replies! The find command reply format allows for at most 16MB of user data. The OP_QUERY reply format, however, can theoretically contain 20MB of user data. This rarely happens in practice, however, since the batching rules in 3.0 are to stop adding documents when the size of the reply buffer exceeds 4MB. But imagine the following problematic scenario: The first result document is just under 4MB, which is added to the OP_QUERY reply buffer. The query system proceeds to fetch another document, since the 4MB threshold has not yet been crossed. The next document is just under 16MB, the maximum document size, and is added to the reply buffer. Now the response has 20MB of user data, which cannot be represented as a find command reply! The find command itself is not affected by a similar problem; when it retrieves a document from the query system that will put the batch size over 16MB, it queues the document for later and returns the batch without it. One possible fix, assuming that we have correctly identified the problem, would be to backport code to 3.0 that adds similar behavior to the legacy OP_QUERY find path. Another approach would be to put a fix into the 3.2 branch to make the replication upconversion avoid creating a too-large document.
        Hide
        david.storch David Storch added a comment -

        I've reproduced the reported stack trace with the following script:

        (function() {
            // Create a replica set with one "3.0" node and one "3.2" node.
            var replSetName = "testset";
            var nodes = [
                {binVersion: "3.0"},
                {binVersion: "3.2"},
            ];
         
            var rst = ReplSetTest({name: replSetName, nodes: nodes});
            rst.startSet();
         
            // Rig the election so that the 3.0 node becomes primary.
            var replSetConfig = rst.getReplSetConfig();
            replSetConfig.members[1].priority = 0;
            rst.initiate(replSetConfig);
         
            var primaryDB = rst.getPrimary().getDB("test");
         
            primaryDB.c.drop();
         
            // Create a document between 3 and 4 megabytes and another document between 15 and 16 MB.
            var str1mb = ',';
            while (str1mb.length < (1024 * 1024)) {
                str1mb += str1mb
            }
            var docSmall = {a: str1mb, b: str1mb, c: str1mb};
            assert.gte(Object.bsonsize(docSmall), 3 * 1024 * 1024);
            assert.lte(Object.bsonsize(docSmall), 4 * 1024 * 1024);
         
            var str15mb = '';
            for (var i = 0; i < 15; i++) {
                str15mb += str1mb;
            }
            var docLarge = {a: str15mb};
            assert.gte(Object.bsonsize(docLarge), 15 * 1024 * 1024);
            assert.lte(Object.bsonsize(docLarge), 16 * 1024 * 1024);
         
            // First document should exceed the 1MB limit of the initial find.
            assert.writeOK(primaryDB.c.insert(docSmall));
         
            // The first getMore batch should contain both the ~3 MB and the ~15 MB document, leading to
            // more than 16 MB of user data in the batch.
            assert.writeOK(primaryDB.c.insert(docSmall));
            assert.writeOK(primaryDB.c.insert(docLarge));
         
            rst.awaitReplication();
         
            rst.stopSet();
        })();
        

        Moving this ticket into "Needs Triage" so that it will be evaluated by the query team.

        Show
        david.storch David Storch added a comment - I've reproduced the reported stack trace with the following script: (function() { // Create a replica set with one "3.0" node and one "3.2" node. var replSetName = "testset"; var nodes = [ {binVersion: "3.0"}, {binVersion: "3.2"}, ];   var rst = ReplSetTest({name: replSetName, nodes: nodes}); rst.startSet();   // Rig the election so that the 3.0 node becomes primary. var replSetConfig = rst.getReplSetConfig(); replSetConfig.members[1].priority = 0; rst.initiate(replSetConfig);   var primaryDB = rst.getPrimary().getDB("test");   primaryDB.c.drop();   // Create a document between 3 and 4 megabytes and another document between 15 and 16 MB. var str1mb = ','; while (str1mb.length < (1024 * 1024)) { str1mb += str1mb } var docSmall = {a: str1mb, b: str1mb, c: str1mb}; assert.gte(Object.bsonsize(docSmall), 3 * 1024 * 1024); assert.lte(Object.bsonsize(docSmall), 4 * 1024 * 1024);   var str15mb = ''; for (var i = 0; i < 15; i++) { str15mb += str1mb; } var docLarge = {a: str15mb}; assert.gte(Object.bsonsize(docLarge), 15 * 1024 * 1024); assert.lte(Object.bsonsize(docLarge), 16 * 1024 * 1024);   // First document should exceed the 1MB limit of the initial find. assert.writeOK(primaryDB.c.insert(docSmall));   // The first getMore batch should contain both the ~3 MB and the ~15 MB document, leading to // more than 16 MB of user data in the batch. assert.writeOK(primaryDB.c.insert(docSmall)); assert.writeOK(primaryDB.c.insert(docLarge));   rst.awaitReplication();   rst.stopSet(); })(); Moving this ticket into "Needs Triage" so that it will be evaluated by the query team.
        Hide
        avrahamk Avraham Kalvo added a comment -

        Thanks a lot David, Thomas, for that elaborated response.

        I couldn't find where to add a comment on the ticket,
        Just wanted to ask, how come when upgrading from 3.0.12 it upgrades with no
        issues?

        Thanks,
        Avi K

        On Sat, Sep 24, 2016 at 1:17 AM, David Storch (JIRA) <jira@mongodb.org>

        Show
        avrahamk Avraham Kalvo added a comment - Thanks a lot David, Thomas, for that elaborated response. I couldn't find where to add a comment on the ticket, Just wanted to ask, how come when upgrading from 3.0.12 it upgrades with no issues? Thanks, Avi K On Sat, Sep 24, 2016 at 1:17 AM, David Storch (JIRA) <jira@mongodb.org>
        Hide
        david.storch David Storch added a comment -

        Avraham Kalvo, a mixed 3.0.12 / 3.2 cluster should be affected by this issue as well. Without further information, my best guess is that you simply happened not to hit the issue. This requires a particular sequence of variously sized BSON objects to be returned in a query response, which is why the issue does not affect many 3.0 => 3.2 rolling upgrades.

        Show
        david.storch David Storch added a comment - Avraham Kalvo , a mixed 3.0.12 / 3.2 cluster should be affected by this issue as well. Without further information, my best guess is that you simply happened not to hit the issue. This requires a particular sequence of variously sized BSON objects to be returned in a query response, which is why the issue does not affect many 3.0 => 3.2 rolling upgrades.
        Hide
        avrahamk Avraham Kalvo added a comment -

        Many thanks,
        You may know close this ticket.

        Show
        avrahamk Avraham Kalvo added a comment - Many thanks, You may know close this ticket.
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

        Message: SERVER-26182 Avoid putting OP_REPLY batches >16MB into a single BSONObj
        Branch: v3.2
        https://github.com/mongodb/mongo/commit/524cc9989bcf743b34bcc9e9d05d9f72cae48620

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'} Message: SERVER-26182 Avoid putting OP_REPLY batches >16MB into a single BSONObj Branch: v3.2 https://github.com/mongodb/mongo/commit/524cc9989bcf743b34bcc9e9d05d9f72cae48620
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

        Message: SERVER-26182 Reserve error code in use on 3.2 branch
        Branch: master
        https://github.com/mongodb/mongo/commit/54b9623f698d00ebc89a031cd4a61e49d1041300

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'} Message: SERVER-26182 Reserve error code in use on 3.2 branch Branch: master https://github.com/mongodb/mongo/commit/54b9623f698d00ebc89a031cd4a61e49d1041300

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

                Agile