Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Aggregation Framework
Labels:

Assigned Teams:

Query Execution
Backwards Compatibility:
Fully Compatible
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In large clusters machines will fail frequently. Aggregation operations in sharded environments should be fairly tolerant of this – this is important for jobs that require many hours to run. Two things we want to handle:

(1) if a node in one shard which was doing work fails, we want the job to still complete.
(2) if that node's work is completely restarted, #1 is fixed but the time for the job to complete might double. that is suboptimal and should be addressed too.

This applies to both map/reduce and the aggregation framework. When done in one reassign ticket to the other team?

This is lower priority than performance optimization and having good concurrency in the frameworks (I'd say do this thereafter).

is related to

SERVER-31782 allow aggregation to take an 'allowPartialResults' option

Backlog

SERVER-17696 Terminate sharded queries immediately after a failure

Closed

Assignee:: [DO NOT USE] Backlog - Query Execution
Reporter:: Dwight Merriman
Participants:: [DO NOT USE] Backlog - Query Execution, Antoine Girbal, David Storch, Dwight Merriman, Kaloian Manassiev
Votes:: 1 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Jun 23 2011 03:31:35 PM UTC
Updated:: Dec 06 2022 05:42:50 AM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates