[SERVER-39361] Synchronise collecting storage engine stats with shutdown Created: 03/Feb/19 Updated: 29/Oct/23 Resolved: 25/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.9, 4.1.9 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Vamsi Boyapati | Assignee: | Sulabh Mahajan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||||||||||||||
| Sprint: | Storage Engines 2019-02-11, Storage Engines 2019-02-25 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 25 | ||||||||||||||||||||||||
| Story Points: | 5 | ||||||||||||||||||||||||
| Description |
|
We are erroneously calling into the storage engine as part of CurOp::completeAndLogOperation() without holding any locks. This means there is no synchronization between server shutdown (or rollbackToStable) and this function call to access the statistics on the session. |
| Comments |
| Comment by Githook User [ 03/Apr/19 ] |
|
Author: {'email': 'sulabh.mahajan@mongodb.com', 'name': 'Sulabh Mahajan', 'username': 'sulabhM'}Message: (cherry picked from commit ba3894493a94ed3c18458f391ff181d57475f010)
(cherry picked from commit 86b6aca9fa1940e85bba87261d1494ef2c208a4a)
(cherry picked from commit 82161eec79fc74652dc07b1c83fe500dc4f95e79)
(cherry picked from commit 3b4c6a689a3fdaa923d427ae112ea599513ef8ce)
(cherry picked from commit 6a9a5855048df1f4796a4032276d01318c398691)
The find command will always have the storage stats because the (cherry picked from commit 6125b5fb078a316854f0299b96b7d16eacb944de)
(cherry picked from commit bacb6b67706a2c057fcd0f76a38f416b225aa69a)
(cherry picked from commit 371197e4bab715a83272a4472e118ee5c5cbbf7c)
(cherry picked from commit 23ca771f76f85638f23bca2a4a6ac196a81fdc21) |
| Comment by Sulabh Mahajan [ 25/Feb/19 ] |
|
This needs to go into 4.0 along with other tickets in PM-1071 |
| Comment by Githook User [ 25/Feb/19 ] |
|
Author: {'name': 'Sulabh Mahajan', 'email': 'sulabh.mahajan@mongodb.com', 'username': 'sulabhM'}Message: |
| Comment by Sulabh Mahajan [ 22/Feb/19 ] |
|
I think what worked here is letting lock acquisition be interrupted and being content with not logging storage statistics in such a case. I have put the change in review. |
| Comment by Eric Milkie [ 19/Feb/19 ] |
|
Regarding the hanging problem, I would take a look at how the profiler logic works, as we had to solve some of the same locking issues there, and the profiler is called in the same place that we're obtaining and writing these WT statistics. |
| Comment by Sulabh Mahajan [ 19/Feb/19 ] |
|
Update: For the reasons given above, I decided to follow the approach 2, ie to take a MODE_IS lock before calling into the storage engine. I hit the following hurdles:
|
| Comment by Sulabh Mahajan [ 14/Feb/19 ] |
|
Update: I have been working on the following approaches: 1. Initially thought of updating stats into a variable in the storage engines implementation of the operation context. Each time the WUOW gets destroyed we could update these stats and CurOp::completeAndLogOperation() could then extract these stats as needed instead of calling into the storage engine. On implementing a prototype found the following issues:
2. Another approach that milkie suggested and I am still working on is to take a global MODE_IS lock in CurOp::completeAndLogOperation() before calling into the storage engine. Shut down is supposed to acquire a MODE_X global lock and will either wait for CurOp::completeAndLogOperation() to finish or stop storage engine being called and shut down instead. I have been facing issues in jstests with this approach. It appears like the last operation in jstests are not getting their result as expected, shutdown kills the process before the results are visible to the test. Still working on resolving this issue. |
| Comment by Vamsi Boyapati [ 03/Feb/19 ] |
|
Please discuss with milkie for changes needed to handle this design fix. |