[SERVER-45037] CollectionBulkLoader::insertDocuments() should be called with the collection cloner mutex held to guarantee memory and thread safety properties. Created: 10/Dec/19 Updated: 29/Oct/23 Resolved: 02/Jan/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Matthew Russotto |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Repl 2019-12-16, Repl 2019-12-30, Repl 2020-01-13 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 58 | ||||||||||||||||
| Description |
|
After this commit, CollectionBulkLoader::insertDocuments() (storage document/ external sorter insertion) is not called with collection cloner mutex held. As a result, we can hit data inconsistency and buffer overflow issues. Data inconsistency can occur because this CollectionBulkLoader::insertDocuments() is not thread-safe function. Note: I was able to see this buffer overflow, lead to server crash, when I tried to disable readOnce cursor on initial sync logkeeper workload. |
| Comments |
| Comment by Githook User [ 02/Jan/20 ] |
|
Author: {'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}Message: |
| Comment by Matthew Russotto [ 16/Dec/19 ] |
|
insertDocumentsCallback() is thread safe. The crash is a result of there being no throttling, so the receive thread (handleNextBatch) is allowed to get far ahead of the insert thread, but this is not a race. By inserting the documents while holding the cloner lock, we effectively force the receive thread to run in lock-step with insertion. There may be a performance gain by using explicit throttling and not running in lock-step, but I think we just need to fix the bug for now. If the dbwork thread is being completely starved, we can still crash, but that case probably indicates some sort of system problem. |
| Comment by Suganthi Mani [ 10/Dec/19 ] |
|
Thinking more about it, I feel, even after we revert to the old behavior (i.e) collectionBulkLoader::insertDocuments() should be called with the collection cloner mutex held, there is a slim chance that we can hit with the buffer overflow issue when your task runner/thread pool is slow in scheduling the task. I feel we should have some limit to that CollectionCloner::_documentsToInsert buffer so that we cover the case of fast producer and slow consumer. |
| Comment by Suganthi Mani [ 10/Dec/19 ] |
|
As a part of this ticket, I would like to add few other minor items 1) correct the warning message- s/_insertDocumentsCallback/insertDocumentsCallback |