[SERVER-62940] DocumentSourceGroup returns different results for in-memory group and spilled group when using $sum Created: 24/Jan/22 Updated: 29/Oct/23 Resolved: 10/Mar/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 5.2.0, 4.2.17, 5.0.5, 5.1.1, 4.4.12 |
| Fix Version/s: | 6.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Yoon Soo Kim | Assignee: | Yoon Soo Kim |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | query-director-triage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||||||||||||||
| Sprint: | QE 2022-02-07, QE 2022-02-21, QE 2022-03-07, QE 2022-03-21 | ||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||
| Description |
|
The reason why the document source group may return different results for in-memory group and spilled group is because when the document source group spills, it spills the accumulator state in the final state, not in intermediate state. For example, $sum accumulator stores decimal total and non-decimal total separately but when the document source group spills data, it calls AccumulatorState::getValue() which returns the final state == decimalTotal.add(nonDecimalTotal.getDecimal()). We may have potentially similar issues for accumulators that need intermediate state in sharded environment and it needs further investigation. I haven’t yet gone deeper on this aspect. But I’m not sure what’s MongoDB’s guarantee about aggregation results in sharded environment. |
| Comments |
| Comment by Githook User [ 10/Mar/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Yoonsoo Kim', 'email': 'yoonsoo.kim@mongodb.com', 'username': 'yun-soo'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoon Soo Kim [ 04/Feb/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here's a repro for sharded $sum/$avg issue. This repro should run on an optimized build because the classic engine deliberately spills data on a debug build for testing purpose and always produces a wrong result.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoon Soo Kim [ 04/Feb/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
mihai.andrei When we discontinue 5.2 (or 5.0?), we can remove the old code that sends old over-the-wire format. It sounds like that we should add upgrade/downgrade scenario tests. What would possibly happen to multiversion agg fuzzer tests? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoon Soo Kim [ 04/Feb/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
mihai.andrei, Thanks for posting the more detailed proposed fix. Actually, I think we missed one thing. We need to know what the type for non-decimal partial result is to merge partial results properly. For example, most data is int32 and just one datum happens to be a double and the datum happens to be at a shard which should send the partial result type == double because it adds int32s and a double. The merging-side should produce 'double' result in the end. But if shards do not send the partial result type for non-decimal sum, the merging-side neither can decide in which type non-decimal partial result can be added to the global result non-decimal part nor decide in which result type it should produce. So, I think we should always send 4 elements over the wire.
Actually, the SBE $sum algorithm maintains these 4 elements. I think this issue has been a long-standing one which we haven't noticed so far. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoon Soo Kim [ 31/Jan/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
steve.la This is specific only to the classic engine's DocumentSourceGroup which returns different $sum result for in-memory group and spilled group. Removed SBE-related part from the repro steps to clarify this issue is related only to the classic engine. The repro step only shows the classic engine's bug. I think we need to investigate sharded accumulators ($sum / $avg / $stdDevPop / $stdDevSamp) behavior for both the classic engine and the SBE. We reuse the merge-side DocumentSourceGroup to implement the SBE sharded accumulators. I haven't had a chance to investigate sharded accumulators behavior. |