Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-24755

explain("executionStats") can attempt to access a collection after it has been dropped

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.3.8
    • Fix Version/s: 3.3.15
    • Component/s: Querying
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      TIG 16 (06/24/16), Query 2016-10-10

      Description

      Calling explain() while running with concurrent clients can cause a segmentation fault or aborting.

       
      [MongoDFixture:job0] 2016-06-22T21:28:35.447+0000 F -        [conn4181] Invalid access at address: 0x656c69
      [JSTest:1] 2016-06-22T21:28:35.452+0000 Top-level statement 402 completed in 30 ms
      [JSTest:6] 2016-06-22T21:28:35.453+0000 Top-level statement 406 completed in 8 ms
      [JSTest:6] 2016-06-22T21:28:35.453+0000 Top-level statement 407 completed in 0 ms
      [JSTest:6] 2016-06-22T21:28:35.454+0000 Top-level statement 408 completed in 1 ms
      [JSTest:6] 2016-06-22T21:28:35.454+0000 Top-level statement 409 completed in 0 ms
      [JSTest:1] 2016-06-22T21:28:35.459+0000 setting random seed: 2182297150
      [JSTest:1] 2016-06-22T21:28:35.459+0000 Top-level statement 403 completed in 7 ms
      [MongoDFixture:job0] 2016-06-22T21:28:35.460+0000 F -        [conn4181] Got signal: 11 (Segmentation fault).
      [MongoDFixture:job0] 
      [MongoDFixture:job0]  0x7faed6940b01 0x7faed693f899 0x7faed693ff1d 0x7faed6c98eb1 0x7faed4249330 0x7faed636268b 0x7faed635b70f 0x7faed635ca26 0x7faed631a9dc 0x7faed631dd17 0x7faed5fcd9df 0x7faed5fc6e26 0x7faed60780aa 0x7faed6079778 0x7faed651865c 0x7faed61a6a51 0x7faed5e02af2 0x7faed68ec38b 0x7faed4241184 0x7faed3f6e37d
      [MongoDFixture:job0] ----- BEGIN BACKTRACE -----
      [JSTest:7] 2016-06-22T21:28:35.462+0000 Top-level statement 375 completed in 16 ms
      [MongoDFixture:job0] {"backtrace":[{"b":"7FAED55DE000","o":"1362B01","s":"_ZN5mongo15printStackTraceERSo"},{"b":"7FAED55DE000","o":"1361899"},{"b":"7FAED55DE000","o":"1361F1D"},{"b":"7FAED55DE000","o":"16BAEB1"},{"b":"7FAED4239000","o":"10330"},{"b":"7FAED55DE000","o":"D8468B","s":"_ZNK5mongo26PlanCacheIndexabilityState17getDiscriminatorsENS_10StringDataE"},{"b":"7FAED55DE000","o":"D7D70F","s":"_ZNK5mongo9PlanCache17encodeKeyForMatchEPKNS_15MatchExpressionEPNS_17StringBuilderImplINS_16TrivialAllocatorEEE"},{"b":"7FAED55DE000","o":"D7EA26","s":"_ZNK5mongo9PlanCache10computeKeyB5cxx11ERKNS_14CanonicalQueryE"},{"b":"7FAED55DE000","o":"D3C9DC","s":"_ZN5mongo7Explain19generatePlannerInfoEPNS_12PlanExecutorEPKNS_10CollectionEPNS_14PlanStageStatsERKSt6vectorISt10unique_ptrIS6_St14default_deleteIS6_EESaISC_EEPNS_14BSONObjBuilderE"},{"b":"7FAED55DE000","o":"D3FD17","s":"_ZN5mongo7Explain13explainStagesEPNS_12PlanExecutorEPKNS_10CollectionENS_13ExplainCommon9VerbosityEPNS_14BSONObjBuilderE"},{"b":"7FAED55DE000","o":"9EF9DF","s":"_ZNK5mongo7FindCmd7explainEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_7BSONObjENS_13ExplainCommon9VerbosityERKNS_3rpc23ServerSelectionMetadataEPNS_14BSONObjBuilderE"},{"b":"7FAED55DE000","o":"9E8E26"},{"b":"7FAED55DE000","o":"A9A0AA","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"7FAED55DE000","o":"A9B778","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"7FAED55DE000","o":"F3A65C","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"7FAED55DE000","o":"BC8A51","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"7FAED55DE000","o":"824AF2","s":"_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE"},{"b":"7FAED55DE000","o":"130E38B","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7FAED4239000","o":"8184"},{"b":"7FAED3E74000","o":"FA37D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.3.8-192-gb022cc6", "gitVersion" : "b022cc63a3517cea5ba2e785ef9dfeb93b664c35", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-74-generic", "version" : "#118-Ubuntu SMP Thu Dec 17 22:52:10 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "b" : "7FAED55DE000", "elfType" : 3, "buildId" : "3C44A06D8988B7E919B5D96852663C867656BC63" }, { "b" : "7FFDE21D0000", "elfType" : 3, "buildId" : "DC075B751E9FB361F14CD59BD81300A6BB5CB377" }, { "b" : "7FAED515A000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "D08DD65F97859C71BB2CBBF1043BD968EFE18AAD" }, { "b" : "7FAED4D7F000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "F86FA9FB4ECEB4E06B40DBDF761A4172B70A4229" }, { "b" : "7FAED4B77000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "E2A6DD5048A0A051FD61043BDB69D8CC68192AB7" }, { "b" : "7FAED4973000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "DA9B8C234D0FE9FD8CAAC8970A7EC1B6C8F6623F" }, { "b" : "7FAED466D000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "D144258E614900B255A31F3FD2283A878670D5BC" }, { "b" : "7FAED4457000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7FAED4239000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "31E9F21AE8C10396171F1E13DA15780986FA696C" }, { "b" : "7FAED3E74000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "CF699A15CAAE64F50311FC4655B86DC39A479789" }, { "b" : "7FAED53B9000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "D0F537904076D73F29E4A37341F8A449E2EF6CD0" } ] }}
      [MongoDFixture:job0]  mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x7faed6940b01]
      [JSTest:6] 2016-06-22T21:28:35.462+0000 Top-level statement 410 completed in 8 ms
      [JSTest:6] 2016-06-22T21:28:35.462+0000 Top-level statement 411 completed in 0 ms
      [JSTest:1] 2016-06-22T21:28:35.462+0000 setting random seed: 3062503873
      [JSTest:1] 2016-06-22T21:28:35.462+0000 Top-level statement 404 completed in 1 ms
      [JSTest:6] 2016-06-22T21:28:35.462+0000 Top-level statement 412 completed in 0 ms
      [MongoDFixture:job0]  mongod(+0x1361899) [0x7faed693f899]
      [MongoDFixture:job0]  mongod(+0x1361F1D) [0x7faed693ff1d]
      [MongoDFixture:job0]  mongod(+0x16BAEB1) [0x7faed6c98eb1]
      [MongoDFixture:job0]  libpthread.so.0(+0x10330) [0x7faed4249330]
      [MongoDFixture:job0]  mongod(_ZNK5mongo26PlanCacheIndexabilityState17getDiscriminatorsENS_10StringDataE+0xBB) [0x7faed636268b]
      [MongoDFixture:job0]  mongod(_ZNK5mongo9PlanCache17encodeKeyForMatchEPKNS_15MatchExpressionEPNS_17StringBuilderImplINS_16TrivialAllocatorEEE+0xCF) [0x7faed635b70f]
      [MongoDFixture:job0]  mongod(_ZNK5mongo9PlanCache10computeKeyB5cxx11ERKNS_14CanonicalQueryE+0x66) [0x7faed635ca26]
      [MongoDFixture:job0]  mongod(_ZN5mongo7Explain19generatePlannerInfoEPNS_12PlanExecutorEPKNS_10CollectionEPNS_14PlanStageStatsERKSt6vectorISt10unique_ptrIS6_St14default_deleteIS6_EESaISC_EEPNS_14BSONObjBuilderE+0x36C) [0x7faed631a9dc]
      [MongoDFixture:job0]  mongod(_ZN5mongo7Explain13explainStagesEPNS_12PlanExecutorEPKNS_10CollectionENS_13ExplainCommon9VerbosityEPNS_14BSONObjBuilderE+0x217) [0x7faed631dd17]
      [MongoDFixture:job0]  mongod(_ZNK5mongo7FindCmd7explainEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_7BSONObjENS_13ExplainCommon9VerbosityERKNS_3rpc23ServerSelectionMetadataEPNS_14BSONObjBuilderE+0xBAF) [0x7faed5fcd9df]
      [MongoDFixture:job0]  mongod(+0x9E8E26) [0x7faed5fc6e26]
      [MongoDFixture:job0]  mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0xB5A) [0x7faed60780aa]
      [MongoDFixture:job0]  mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0x978) [0x7faed6079778]
      [MongoDFixture:job0]  mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x26C) [0x7faed651865c]
      [MongoDFixture:job0]  mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xC61) [0x7faed61a6a51]
      [MongoDFixture:job0]  mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0x102) [0x7faed5e02af2]
      [MongoDFixture:job0]  mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x33B) [0x7faed68ec38b]
      [MongoDFixture:job0]  libpthread.so.0(+0x8184) [0x7faed4241184]
      [MongoDFixture:job0]  libc.so.6(clone+0x6D) [0x7faed3f6e37d]
      [MongoDFixture:job0] -----  END BACKTRACE  -----
      

        Activity

        Hide
        david.storch David Storch added a comment -

        It looks like may be a data race inside the PlanCache introduced during 3.2 development for partial indexes. Nice find, Isabella Siu! Is it okay if I move this onto the query team's backlog, or are you actively investigating?

        Show
        david.storch David Storch added a comment - It looks like may be a data race inside the PlanCache introduced during 3.2 development for partial indexes . Nice find, Isabella Siu ! Is it okay if I move this onto the query team's backlog, or are you actively investigating?
        Hide
        isabella.siu Isabella Siu added a comment - - edited

        David Storch It's okay, I'm not actively investigating it.

        Show
        isabella.siu Isabella Siu added a comment - - edited David Storch It's okay, I'm not actively investigating it.
        Hide
        david.storch David Storch added a comment -

        I can reproduce this crash reliably by running the following commands in separate shells connected to the same server:

        // Run this in this one shell.
        while (true) { db.c.insert({_id: 1}); db.c.insert({_id: 2}); db.c.drop(); }
         
        // Run this at the same time from a separate shell.
        while (true) { db.c.find({$where: "sleep(200)"}).explain("executionStats"); }
        

        I initially suspected that this was due to a data race inside the PlanCache, but this turned out not to be the case. Instead, this is a bug in explain which can cause us to attempt to access the PlanCache of a collection that has already been dropped. If the explain verbosity is "executionStats" or higher, we execute the query in order to gather the requested statistics:

        https://github.com/mongodb/mongo/blob/r3.3.14/src/mongo/db/query/explain.cpp#L719-L723

        If execution of the query fails, explain should report this by including a value of false inside the executionSuccess field. The explain command itself, however, does not fail. When the error status is due to the collection being dropped during a yield, we still attempt to access the collection in order to obtain a value for the indexFilterSet field:

        https://github.com/mongodb/mongo/blob/r3.3.14/src/mongo/db/query/explain.cpp#L595-L606

        In order to fix, we should not attempt to gather a value for indexFilterSet when PlanExecutor::executePlan() returns with ErrorCodes::QueryPlanKilled.

        Show
        david.storch David Storch added a comment - I can reproduce this crash reliably by running the following commands in separate shells connected to the same server: // Run this in this one shell. while (true) { db.c.insert({_id: 1}); db.c.insert({_id: 2}); db.c.drop(); }   // Run this at the same time from a separate shell. while (true) { db.c.find({$where: "sleep(200)"}).explain("executionStats"); } I initially suspected that this was due to a data race inside the PlanCache, but this turned out not to be the case. Instead, this is a bug in explain which can cause us to attempt to access the PlanCache of a collection that has already been dropped. If the explain verbosity is "executionStats" or higher, we execute the query in order to gather the requested statistics: https://github.com/mongodb/mongo/blob/r3.3.14/src/mongo/db/query/explain.cpp#L719-L723 If execution of the query fails, explain should report this by including a value of false inside the executionSuccess field. The explain command itself, however, does not fail. When the error status is due to the collection being dropped during a yield, we still attempt to access the collection in order to obtain a value for the indexFilterSet field: https://github.com/mongodb/mongo/blob/r3.3.14/src/mongo/db/query/explain.cpp#L595-L606 In order to fix, we should not attempt to gather a value for indexFilterSet when PlanExecutor::executePlan() returns with ErrorCodes::QueryPlanKilled .
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

        Message: SERVER-24755 fix explain to handle collection drop during yield
        Branch: master
        https://github.com/mongodb/mongo/commit/1d6dce66f73719e8bf710d5e1e19169b27f91344

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'} Message: SERVER-24755 fix explain to handle collection drop during yield Branch: master https://github.com/mongodb/mongo/commit/1d6dce66f73719e8bf710d5e1e19169b27f91344

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

                Agile