[SERVER-78103] SBE can crash for $_resumeAfter for clustered collections with null value Created: 14/Jun/23 Updated: 29/Oct/23 Resolved: 06/Jul/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 7.1.0-rc0 |
| Fix Version/s: | 7.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Gil Alon | Assignee: | Kevin Cherkauer |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Execution
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | QE 2023-07-10 | ||||||||
| Participants: | |||||||||
| Description |
|
There is a difference in classic and SBE when running $_resumeAfter with clustered collections where SBE can crash the server (steps to reproduce are below). This will affect all versions that use SBE for clustered collections (right now only 7.1) I think this is partially because how SBE and classic handle null input to $_resumeAfter. Classic and SBE return different results for the postBatchResumeToken after all the documents are returned in the collection. When we have no more documents left, classic returns {$recordId: null} and then after returns the first document recordId. SBE just returns the last document recordId again and again. SBE does not return a null value. Either both engines should return the same output, or at least SBE should not crash when run with {'$recordId': null}. This was discovered in another bug investigation ( |
| Comments |
| Comment by Githook User [ 06/Jul/23 ] |
|
Author: {'name': 'Kevin Cherkauer', 'email': 'kevin.cherkauer@mongodb.com', 'username': 'kevin-cherkauer'}Message: |
| Comment by Kevin Cherkauer [ 30/Jun/23 ] |
|
The problem is that in SBE, $_resumeAfter: {$recordId: null} results in a null record ID value coming in through the _seekRecordId slot, which crashes seekExact() as it expects this to be a string. The fix is to add a check for this case in scan.cpp ScanStage::getNext(). |