[SERVER-78103] SBE can crash for $_resumeAfter for clustered collections with null value Created: 14/Jun/23  Updated: 29/Oct/23  Resolved: 06/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 7.1.0-rc0
Fix Version/s: 7.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Gil Alon Assignee: Kevin Cherkauer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-77386 Make '$_resumeAfter' parameter work w... Closed
Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: QE 2023-07-10
Participants:

 Description   

There is a difference in classic and SBE when running $_resumeAfter with clustered collections where SBE can crash the server (steps to reproduce are below). This will affect all versions that use SBE for clustered collections (right now only 7.1)

I think this is partially because how SBE and classic handle null input to $_resumeAfter. Classic and SBE return different results for the postBatchResumeToken after all the documents are returned in the collection. When we have no more documents left, classic returns {$recordId: null} and then after returns the first document recordId. SBE just returns the last document recordId again and again. SBE does not return a null value.

Either both engines should return the same output, or at least SBE should not crash when run with {'$recordId': null}. This was discovered in another bug investigation (SERVER-77386).



 Comments   
Comment by Githook User [ 06/Jul/23 ]

Author:

{'name': 'Kevin Cherkauer', 'email': 'kevin.cherkauer@mongodb.com', 'username': 'kevin-cherkauer'}

Message: SERVER-78103 Fix clustered coll $_resumeAfter: {$recordId: null} in SBE
Branch: master
https://github.com/mongodb/mongo/commit/c53b81d0d800f7338a7f3f81b9526343fef5f5a1

Comment by Kevin Cherkauer [ 30/Jun/23 ]

The problem is that in SBE, $_resumeAfter: {$recordId: null} results in a null record ID value coming in through the _seekRecordId slot, which crashes seekExact() as it expects this to be a string.

The fix is to add a check for this case in scan.cpp ScanStage::getNext().

Generated at Thu Feb 08 06:37:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.