[SERVER-35849] Remove dependency of the write commands on `sharding_runtime_d` Created: 27/Jun/18 Updated: 08/Jan/24 Resolved: 19/Jul/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.2, 4.1.1 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Cheahuychou Mao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||||||
| Sprint: | Sharding 2018-07-16, Sharding 2018-07-30 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
The write commands execution logic handles StaleConfig exceptions differently from regular commands because of the need to contain the exception as part of the array of write results and not as part of the command result itself. Specifically, it does not let the exception bubble-up to the endpoint code and instead invokes onShardVersionMismatch or onCannotImplicitlyCreateCollection directly. This causes a linking dependency of the write commands library on sharding_runtime_d, but also introduces more places where onShardVersionMismatch could be called, which could be a source of bugs. This ticket is to figure out an implementation, which consolidates the onShardVersionMismatch actions in one place (preferably the entry point) and to remove the link dependency on sharding_runtime_d. |
| Comments |
| Comment by Andrew Morrow (Inactive) [ 09/Aug/18 ] |
|
No, apparently I just can't read. I saw the declines but not the completed. |
| Comment by Githook User [ 07/Aug/18 ] |
|
Author: {'username': 'cheahuychou', 'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com'}Message: (cherry picked from commit a8e4cedfc7d7f48ac59fc4860ca6d8519421fdf5) |
| Comment by Githook User [ 19/Jul/18 ] |
|
Author: {'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}Message: |
| Comment by Andy Schwerin [ 29/Jun/18 ] |
|
OK. Please CC me on the code review. |
| Comment by Kaloian Manassiev [ 28/Jun/18 ] |
|
It will break it because write commands will no longer be calling into onShardVersionMismatch directly. That way we can throw out the dependency on sharding_runtime_d. |
| Comment by Randolph Tan [ 28/Jun/18 ] |
|
Sounds good to me, but how would this break the dependency? |
| Comment by Kaloian Manassiev [ 28/Jun/18 ] |
|
The refresh from the config server itself is happening on a separate thread. The work which will be done in the destructor is no different than what is happening today in the exception handler - 1) wait for the critical section to complete and 2) when the CS completes, wait for the refresh thread to complete and install the refreshed metadata on the CSS (under collection X lock). If timeout or interruption occurs during either 1 or 2 today, the thread just ignores it and returns back to mongos, which will retry. In other words, exceptions will never propagate out of the destructor, just like they never propagate out of the exception handler today. |
| Comment by Andy Schwerin [ 28/Jun/18 ] |
|
I'm nervous about the refresh in the destructor. Why can't it timeout or get interrupted? |
| Comment by Kaloian Manassiev [ 27/Jun/18 ] |
|
Since all StaleConfig exceptions originate in either CollectionShardingState's or DatabaseShardingState's version checking logic, I propose that we use the OperationShardingState as a way to convey that a certain operation needs to perform shard version refresh and blockage on completion. The OperationShardingState will contain a boost::optional<Status> _shardingOperationFailedStatus, which will be set to one of the exceptions which require post-command execution actions (for now these are StaleConfig, StaleDbVersion, CannotImplicitlyCreateCollection). There will be a RAII class ScopedOperationCompletionShardingActions, which in its destructor will inspect the exception (if any) stored in the OperationShardingState and will perform the necessary refresh. These operations currently never need to throw, so this destructor will also never throw. The ScopedOperationCompletionShardingActions will be instantiated in the beginning of ServiceEntryPointCommon::handleRequest in order to ensure that it covers all requests. The destructor of ~ScopedOperationCompletionShardingActions will serve as the only place where "sharding refresh" actions will be executed. schwerin, mira.carey@mongodb.com, renctan - how does this sound? |