Query users of the external sorter ($group, $bucketAuto, etc) generate unique file names by incrementing a counter that starts at 0 on startup. When an unfinished index build is being resumed during startup, these temporary files are not cleared, and can result in queries spilling and re-using existing files.
If this happens, the following crash will occur:
ChecksumMismatch: Data read from disk does not match what was written to disk. Possible corruption of data.
We normally delete the "_tmp" directory at startup, however this will not happen if there are any index builds that need to be resumed after a clean shutdown:
// If we did not find any index builds to resume or we are starting up after an unclean // shutdown, nothing in the temp directory will be used. Thus, we can clear it. if (reconcileResult.indexBuildsToResume.empty() || lastShutdownState == StorageEngine::LastShutdownState::kUnclean) { LOGV2(5071100, "Clearing temp directory"); boost::system::error_code ec; boost::filesystem::remove_all(storageGlobalParams.dbpath + "/_tmp/", ec);
There are few possible solutions:
- On startup, clear everything in the _tmp directory except for the sorter files needed for resumable index builds
- Provide an option to the external sorter to truncate new files before writing