Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3322

sharded aggregations should be very tolerant of node failures

    • Query Execution
    • Fully Compatible

      In large clusters machines will fail frequently. Aggregation operations in sharded environments should be fairly tolerant of this – this is important for jobs that require many hours to run. Two things we want to handle:

      (1) if a node in one shard which was doing work fails, we want the job to still complete.
      (2) if that node's work is completely restarted, #1 is fixed but the time for the job to complete might double. that is suboptimal and should be addressed too.

      This applies to both map/reduce and the aggregation framework. When done in one reassign ticket to the other team?

      This is lower priority than performance optimization and having good concurrency in the frameworks (I'd say do this thereafter).

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            dwight@mongodb.com Dwight Merriman
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: