[SERVER-20057] Concurrent, sharded mapReduces can fail when temporary namespaces collide across mongos processes Created: 20/Aug/15  Updated: 06/Dec/22  Resolved: 09/Mar/20

Status: Closed
Project: Core Server
Component/s: MapReduce, Sharding
Affects Version/s: 3.1.6
Fix Version/s: 4.4.0

Type: Bug Priority: Major - P3
Reporter: Kamran K. Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Done Votes: 1
Labels: 32qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-34539 Re-enable sharded mapReduce concurren... Closed
Assigned Teams:
Sharding
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

It's possible for concurrent, sharded mapReduces to fail with DEAD plan executors when there's a collision in temporary namespaces across multiple mongos processes.

This bug is intermittently triggered by the concurrency suite.

This seems to be the sequence of events:

1 - A mongos process issues a drop command, on all shards, on a tmp.mrs namespace after finishing the mapReduce.shardedfinish command (in cluster_map_reduce_cmd.cpp).

2 - At the same time, another mongos process tries to initialize a ParallelSortClusteredCursor on the very same tmp.mrs namespace as part of another mapReduce.shardedfinish command.

3 - The drop invalidates cursors on the tmp.mrs namespace, which leads to a DEAD plan executor and a failed mapReduce command.


Relevant log lines:

I COMMAND  [conn26] CMD: drop db1.tmp.mrs.coll1_1440026655_43
E QUERY    [conn30] Plan executor error during find: DEAD, stats: { stage: "FETCH", nReturned: 0, executionTimeMillisEstimate: 0, works: 0, advanced: 0, needTime: 0, needYield: 0, saveState: 1, restoreState: 0, isEOF: 0, invalidates: 0, docsExamined: 0, alreadyHasObj: 0, inputStage: { stage: "IXSCAN", nReturned: 0, executionTimeMillisEstimate: 0, works: 0, advanced: 0, needTime: 0, needYield: 0, saveState: 1, restoreState: 0, isEOF: 0, invalidates: 0, keyPattern: { _id: 1 }, indexName: "_id_", isMultiKey: false, isUnique: true, isSparse: false, isPartial: false, indexVersion: 1, direction: "forward", indexBounds: { _id: [ "[MinKey, MaxKey]" ] }, keysExamined: 0, dupsTested: 0, dupsDropped: 0, seenInvalidated: 0 } }
I QUERY    [conn30] assertion 17144 Executor error: OperationFailed Operation aborted because: all indexes on collection dropped ns:db1.tmp.mrs.coll1_1440026655_43 query:{ query: {}, orderby: { _id: 1 } }

Test output:

Error: map reduce failed:{
  "ok" : 0,
  "errmsg" : "MR post processing failed: { ok: 0.0, errmsg: \"could not initialize cursor across all shards because : Executor error: OperationFailed Operation aborted because: all indexes on collection dropped @...\", code: 14827 }"
}



 Comments   
Comment by Charlie Swanson [ 09/Mar/20 ]

This is no longer a problem after completing a recent project where we created a new implementation of mapReduce backed by the aggregation framework.

Comment by Githook User [ 21/Aug/15 ]

Author:

{u'username': u'kkmongo', u'name': u'Kamran Khan', u'email': u'kamran.khan@mongodb.com'}

Message: SERVER-20057 Disable mapReduce tests in the sharded concurrency suites

The tests intermittently fail because temporary namespaces can collide
across mongos processes. These collisions lead to plan executor errors
during the mapReduce commands.
Branch: master
https://github.com/mongodb/mongo/commit/081e8067f7cc39ed57efb6124590d223e49a4595

Generated at Thu Feb 08 03:53:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.