[SERVER-84362] Investigate ObjectIsBusy error in the replica set endpoint suite when the ValidateCollections hook is enabled Created: 21/Dec/23 Updated: 16/Jan/24 Resolved: 16/Jan/24 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.3.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Cheahuychou Mao | Assignee: | Gregory Noma |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Storage Execution
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Execution Team 2024-01-22 | ||||||||
| Participants: | |||||||||
| Description |
|
The hook has been disabled in this suite since it was causing ObjectIsBusy error with the message Executor error during find command :: caused by :: 16: Device or resource busy. |
| Comments |
| Comment by Githook User [ 16/Jan/24 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: GitOrigin-RevId: 6565d8465eb38425a14534855a5b6741da61a8ed |
| Comment by Gregory Noma [ 12/Jan/24 ] |
|
I think I figured out why there are two threads both running validate on the same node. The hook runs validate against each node in parallel with one another. But since the command goes through the router path with read preference "secondaryPreferred", both of these end up getting run on the same node (in the case of a two-node replica set). So, I think the solution here is to add validate to this set of commands which should not go through the router path. This way validate will correctly be run on each node that it is executed on. |
| Comment by Gregory Noma [ 11/Jan/24 ] |
|
I noticed that there are multiple threads performing collection validations at the time time right before one of them fails with ObjectIsBusy. So, here's my current theory. One thread is performing full validation on a collection in the config database. Specifically, it is performing a verify on one of the collection's tables. Since validate gets run through the router path in this test suite, another thread running validate may end up needing to run a find on the config database here. Then this find maybe fail with ObjectIsBusy due to the aforementioned verify operation. Thus the validate will also fail with ObjectIsBusy. The thing I'm unsure about at this point is why validate is being run on multiple thread concurrently, since the hook appears to do this serially. |
| Comment by Gregory Noma [ 10/Jan/24 ] |
|
cheahuychou.mao@mongodb.com the ValidateCollections hook runs full validation. Full validation includes WT's verify which requires exclusive access to a table. When verify is running, any other operations that attempt to access that table will receive an EBUSY error which gets converted into ObjectIsBusy. Usually the ValidateCollections hook runs after a test when both the test and the test fixture are not trying to concurrently perform any operations against the database. Given that we're seeing ObjectIsBusy errors, is this test suite still performing other operations even when the ValidateCollections hook runs? |