[SERVER-37293] Refactor Sorter so that DocumentSourceGroup can use it instead of SortedFileWriter implementation Created: 24/Sep/18 Updated: 21/Jun/19 Resolved: 21/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Daniel Solnik (Inactive) |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | Execution Team 2019-06-17, Execution Team 2019-07-01 | ||||||||
| Participants: | |||||||||
| Description |
|
document_source_group.h/cpp skips the Sorter interface and re-implements some of the logic in order to use the SortedFileWriter class directly. This appear to have been done to avoid the sorting phase of Sorter because DocumentSourceGroup has already sorted data. Therefore, we should expose an option on the Sorter to handle pre-sorted data, and remove the duplicate code from document_source_group.h/cpp. Additionally, document_source_group.h/cpp forces sorter.h to make internal structs public that would otherwise be private to sorter.cpp because it needs to use them directly. FileDeleter and SorterFileInfo (added by |
| Comments |
| Comment by Dianna Hohensee (Inactive) [ 21/Jun/19 ] |
|
This change would require significant query code changes, which also could potentially affect performance, so we are not going to do it. DocumentSourceGroup maintains a a private GroupMap (_groups) that DocumentSourceGroup::initialize() fills from a pSource and periodically spills to disk when it reaches a certain size. DocumentSourceGroup::spill() performs its own stable_sort on the contents of _group (that looks like it is sorted by an id) and then translates the _group data structure entries into a format for disk. Then DocumentSourceGroup::spill() uses a SortedFileWriter and SortedFileWriter::addAlreadySorted() to spill to disk. It would be difficult to move this functionality into the Sorter interface, and may hurt performance, as Dan described in more detail above. |
| Comment by Daniel Solnik (Inactive) [ 19/Jun/19 ] |
|
It seems like the DocumentSourceGroup has two separate codepaths for depending on whether or not the groups were spilled to disk. One way to use the sorter interface in DocumentSourceGroup is by always serializing the accumulators and then adding the (id, accumulators) and using the sorter to sort by id and then deserializing the results from the sorter. However, this change would result in a possible perf hit when a spill to disk is not necessary (when the groups can be kept in memory) as currently this serialization is not done if the groups can be kept in memory. |