Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20057

Concurrent, sharded mapReduces can fail when temporary namespaces collide across mongos processes

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: 3.1.6
    • Fix Version/s: Backlog
    • Component/s: MapReduce, Sharding
    • Labels:

      Description

      It's possible for concurrent, sharded mapReduces to fail with DEAD plan executors when there's a collision in temporary namespaces across multiple mongos processes.

      This bug is intermittently triggered by the concurrency suite.

      This seems to be the sequence of events:

      1 - A mongos process issues a drop command, on all shards, on a tmp.mrs namespace after finishing the mapReduce.shardedfinish command (in cluster_map_reduce_cmd.cpp).

      2 - At the same time, another mongos process tries to initialize a ParallelSortClusteredCursor on the very same tmp.mrs namespace as part of another mapReduce.shardedfinish command.

      3 - The drop invalidates cursors on the tmp.mrs namespace, which leads to a DEAD plan executor and a failed mapReduce command.


      Relevant log lines:

      I COMMAND  [conn26] CMD: drop db1.tmp.mrs.coll1_1440026655_43
      E QUERY    [conn30] Plan executor error during find: DEAD, stats: { stage: "FETCH", nReturned: 0, executionTimeMillisEstimate: 0, works: 0, advanced: 0, needTime: 0, needYield: 0, saveState: 1, restoreState: 0, isEOF: 0, invalidates: 0, docsExamined: 0, alreadyHasObj: 0, inputStage: { stage: "IXSCAN", nReturned: 0, executionTimeMillisEstimate: 0, works: 0, advanced: 0, needTime: 0, needYield: 0, saveState: 1, restoreState: 0, isEOF: 0, invalidates: 0, keyPattern: { _id: 1 }, indexName: "_id_", isMultiKey: false, isUnique: true, isSparse: false, isPartial: false, indexVersion: 1, direction: "forward", indexBounds: { _id: [ "[MinKey, MaxKey]" ] }, keysExamined: 0, dupsTested: 0, dupsDropped: 0, seenInvalidated: 0 } }
      I QUERY    [conn30] assertion 17144 Executor error: OperationFailed Operation aborted because: all indexes on collection dropped ns:db1.tmp.mrs.coll1_1440026655_43 query:{ query: {}, orderby: { _id: 1 } }
      

      Test output:

      Error: map reduce failed:{
        "ok" : 0,
        "errmsg" : "MR post processing failed: { ok: 0.0, errmsg: \"could not initialize cursor across all shards because : Executor error: OperationFailed Operation aborted because: all indexes on collection dropped @...\", code: 14827 }"
      }
      

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated: