Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-26182

3.2 node syncing from a 3.0 node can crash due to too-large BSON during upconversion to find command reply

    • Fully Compatible
    • ALL
    • Hide

      Upgrade member to 3.2

      Show
      Upgrade member to 3.2
    • Query 2016-10-10, Query 2016-11-21

      When a 3.2 replica set member is syncing from a 3.0 node, it downconverts find command requests to legacy OP_QUERY find requests and upconverts legacy OP_QUERY find replies to find command replies. This upconversion process can create a document that is larger than the 16 MB maximum, causing the secondary to trip a fatal assertion. See comment below for more details.

      This issue should only affect users whose data files contain documents 12MB or larger.

      A backtrace such as the following should appear in the logs when this issue occurs:

      2016-09-19T13:07:36.142+0000 I -        [NetworkInterfaceASIO-BGSync-0] Assertion: 10334:BSONObj size: 17130437 (0x10563C5) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 99445122894
      2016-09-19T13:07:36.154+0000 I CONTROL  [NetworkInterfaceASIO-BGSync-0]
       0x130c4a2 0x12a7248 0x128efe8 0x128f09c 0x973949 0xda2484 0x10abf65 0x10b75fb 0x10b929c 0x10b98a8 0x10b7040 0x108a750 0x10994ac 0x1099968 0x1329e71 0x132a091 0x132e22f 0x10aeda5 0x1b431c0 0x7f881c697b50 0x7f881c3e1fbd
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"F0C4A2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EA7248","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"E8EFE8","s":"_ZN5mongo11msgassertedEiPKc"},{"b":"400000","o":"E8F09C"},{"b":"400000","o":"573949","s":"_ZNK5mongo7BSONObj14_assertInvalidEv"},{"b":"400000","o":"9A2484","s":"_ZN5mongo27appendGetMoreResponseObjectExNS_10StringDataENS_9BSONArrayEPNS_14BSONObjBuilderE"},{"b":"400000","o":"CABF65","s":"_ZN5mongo8executor30upconvertLegacyGetMoreResponseEjNS_10StringDataERKNS_7MessageE"},{"b":"400000","o":"CB75FB","s":"_ZN5mongo8executor20NetworkInterfaceASIO12AsyncCommand8responseENS_3rpc8ProtocolENS_6Date_tEPNS3_18EgressMetadataHookE"},{"b":"400000","o":"CB929C","s":"_ZN5mongo8executor20NetworkInterfaceASIO20_completedOpCallbackEPNS1_7AsyncOpE"},{"b":"400000","o":"CB98A8"},{"b":"400000","o":"CB7040"},{"b":"400000","o":"C8A750","s":"_ZN4asio6detail14strand_service8dispatchINS0_7binder2IRSt8functionIFvSt10error_codemEES5_mEEEEvRPNS1_11strand_implERT_"},{"b":"400000","o":"C994AC","s":"_ZN4asio6detail14strand_service8dispatchINS0_17rewrapped_handlerINS0_7binder2INS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS8_EEEENS_17mutable_buffers_1ENS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEESI_mEESK_EEEEvRPNS1_11strand_implERT_"},{"b":"400000","o":"C99968","s":"_ZN4asio6detail23reactive_socket_recv_opINS_17mutable_buffers_1ENS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS6_EEEES2_NS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEEE11do_completeEPvPNS0_19scheduler_operationERKSF_m"},{"b":"400000","o":"F29E71","s":"_ZN4asio6detail9scheduler10do_run_oneERNS0_11scoped_lockINS0_11posix_mutexEEERNS0_21scheduler_thread_infoERKSt10error_code"},{"b":"400000","o":"F2A091","s":"_ZN4asio6detail9scheduler3runERSt10error_code"},{"b":"400000","o":"F2E22F","s":"_ZN4asio10io_service3runEv"},{"b":"400000","o":"CAEDA5"},{"b":"400000","o":"17431C0","s":"execute_native_thread_routine"},{"b":"7F881C691000","o":"6B50"},{"b":"7F881C304000","o":"DDFBD","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.9", "gitVersion" : "22ec9e93b40c85fc7cae7d56e7d6a02fd811088c", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.2.0-4-amd64", "version" : "#1 SMP Debian 3.2.73-2+deb7u2", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "9A35851B156CCAF69D719418740E962817B11EA7" }, { "b" : "7FFCBB8BF000", "elfType" : 3, "buildId" : "000E93FBF194E9B9370E69572D55070F81BD1815" }, { "b" : "7F881D54A000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "16DA62C831D8080E3BCB9A1E7738D399EA3A89C0" }, { "b" : "7F881D151000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "079D784F9F4A88B6D87FC3E6B0B07BFBE209540E" }, { "b" : "7F881CF49000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "07CC333722DC4237B5F88E47A38C214FDA0B9B59" }, { "b" : "7F881CD45000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "028546D6F23D0FD259928FED6EFE53E64782EF2D" }, { "b" : "7F881CAC3000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "2C5446FA41E784A1C6FDD528AB2BCC7BCA9EF06C" }, { "b" : "7F881C8AD000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "F980B1188708F8D8B5C35D185444AF4CB939AA1E" }, { "b" : "7F881C691000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "3F3BFFEDAF77C955862785C6CBA7670F82A05563" }, { "b" : "7F881C304000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "50B9FD83D874F18E3B385E0349FBD5D1281A19E3" }, { "b" : "7F881D7A9000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "37BC6FC6FF3FA04B0A3EC20DFF215AB55EA5F5CB" }, { "b" : "7F881C0ED000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "1EFEB71FD4999C2307570D673A724EA4E1D85267" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x130c4a2]
       mongod(_ZN5mongo10logContextEPKc+0x138) [0x12a7248]
       mongod(_ZN5mongo11msgassertedEiPKc+0x88) [0x128efe8]
       mongod(+0xE8F09C) [0x128f09c]
       mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3B9) [0x973949]
       mongod(_ZN5mongo27appendGetMoreResponseObjectExNS_10StringDataENS_9BSONArrayEPNS_14BSONObjBuilderE+0x514) [0xda2484]
       mongod(_ZN5mongo8executor30upconvertLegacyGetMoreResponseEjNS_10StringDataERKNS_7MessageE+0x145) [0x10abf65]
       mongod(_ZN5mongo8executor20NetworkInterfaceASIO12AsyncCommand8responseENS_3rpc8ProtocolENS_6Date_tEPNS3_18EgressMetadataHookE+0x18B) [0x10b75fb]
       mongod(_ZN5mongo8executor20NetworkInterfaceASIO20_completedOpCallbackEPNS1_7AsyncOpE+0x5C) [0x10b929c]
       mongod(+0xCB98A8) [0x10b98a8]
       mongod(+0xCB7040) [0x10b7040]
       mongod(_ZN4asio6detail14strand_service8dispatchINS0_7binder2IRSt8functionIFvSt10error_codemEES5_mEEEEvRPNS1_11strand_implERT_+0x70) [0x108a750]
       mongod(_ZN4asio6detail14strand_service8dispatchINS0_17rewrapped_handlerINS0_7binder2INS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS8_EEEENS_17mutable_buffers_1ENS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEESI_mEESK_EEEEvRPNS1_11strand_implERT_+0x89C) [0x10994ac]
       mongod(_ZN4asio6detail23reactive_socket_recv_opINS_17mutable_buffers_1ENS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS6_EEEES2_NS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEEE11do_completeEPvPNS0_19scheduler_operationERKSF_m+0x228) [0x1099968]
       mongod(_ZN4asio6detail9scheduler10do_run_oneERNS0_11scoped_lockINS0_11posix_mutexEEERNS0_21scheduler_thread_infoERKSt10error_code+0x2F1) [0x1329e71]
       mongod(_ZN4asio6detail9scheduler3runERSt10error_code+0xC1) [0x132a091]
       mongod(_ZN4asio10io_service3runEv+0x2F) [0x132e22f]
       mongod(+0xCAEDA5) [0x10aeda5]
       mongod(execute_native_thread_routine+0x20) [0x1b431c0]
       libpthread.so.0(+0x6B50) [0x7f881c697b50]
       libc.so.6(clone+0x6D) [0x7f881c3e1fbd]
      -----  END BACKTRACE  -----
      
      Original description

      We've recently started upgrading our Mongo replica sets to 3.2,
      We have one replica set for which it doesn't apply as swiftly as the others though,
      When trying upon its first secondary node, starting it up after binaries were switched to 3.2 appears to first successfully mark it as part of its configured replica set and set it to a SECONDARY state, but immediately after change its state to a STANDALONE instance while the enclosed errors are printed on the instance log:

      2016-09-19T13:07:35.977+0000 I NETWORK [initandlisten] connection accepted from 10.14.0.51:45381 #140 (136 connections now open)
      2016-09-19T13:07:36.142+0000 I - [NetworkInterfaceASIO-BGSync-0] Assertion: 10334:BSONObj size: 17130437 (0x10563C5) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 99445122894

      This appears to have to do with a specific collection that at least one document that is bigger than 16MB, thing is we've issued db.upgradeCheckAllDBs() prior to the upgrade for this member and it completed perfectly fine.

      How can we spot the root cause for this and eliminate it please so we can move on with the 3.2 upgrade for this replica set as well as all of our other replica sets?

      Many thanks,
      Avraham Kalvo, DBA

            Assignee:
            charlie.swanson@mongodb.com Charlie Swanson
            Reporter:
            avrahamk Avraham Kalvo
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: