[SERVER-35849] Remove dependency of the write commands on `sharding_runtime_d` Created: 27/Jun/18  Updated: 08/Jan/24  Resolved: 19/Jul/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.0.2, 4.1.1

Type: Improvement Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Cheahuychou Mao
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-29908 Libraries db/s/sharding and db/query/... Closed
is depended on by SERVER-27725 Use batch insert when migrating chunks Closed
Backwards Compatibility: Minor Change
Backport Requested:
v4.0
Sprint: Sharding 2018-07-16, Sharding 2018-07-30
Participants:

 Description   

The write commands execution logic handles StaleConfig exceptions differently from regular commands because of the need to contain the exception as part of the array of write results and not as part of the command result itself. Specifically, it does not let the exception bubble-up to the endpoint code and instead invokes onShardVersionMismatch or onCannotImplicitlyCreateCollection directly.

This causes a linking dependency of the write commands library on sharding_runtime_d, but also introduces more places where onShardVersionMismatch could be called, which could be a source of bugs.

This ticket is to figure out an implementation, which consolidates the onShardVersionMismatch actions in one place (preferably the entry point) and to remove the link dependency on sharding_runtime_d.



 Comments   
Comment by Andrew Morrow (Inactive) [ 09/Aug/18 ]

No, apparently I just can't read. I saw the declines but not the completed.

Comment by Githook User [ 07/Aug/18 ]

Author:

{'username': 'cheahuychou', 'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com'}

Message: SERVER-35849 Remove dependency of the write commands on sharding_runtime_d

(cherry picked from commit a8e4cedfc7d7f48ac59fc4860ca6d8519421fdf5)
Branch: v4.0
https://github.com/mongodb/mongo/commit/8531b53c6a61bbaecb0ec1440a4103a020e645d8

Comment by Githook User [ 19/Jul/18 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-35849 Remove dependency of the write commands on sharding_runtime_d
Branch: master
https://github.com/mongodb/mongo/commit/a8e4cedfc7d7f48ac59fc4860ca6d8519421fdf5

Comment by Andy Schwerin [ 29/Jun/18 ]

OK. Please CC me on the code review.

Comment by Kaloian Manassiev [ 28/Jun/18 ]

It will break it because write commands will no longer be calling into onShardVersionMismatch directly. That way we can throw out the dependency on sharding_runtime_d.

Comment by Randolph Tan [ 28/Jun/18 ]

Sounds good to me, but how would this break the dependency?

Comment by Kaloian Manassiev [ 28/Jun/18 ]

The refresh from the config server itself is happening on a separate thread. The work which will be done in the destructor is no different than what is happening today in the exception handler - 1) wait for the critical section to complete and 2) when the CS completes, wait for the refresh thread to complete and install the refreshed metadata on the CSS (under collection X lock).

If timeout or interruption occurs during either 1 or 2 today, the thread just ignores it and returns back to mongos, which will retry. In other words, exceptions will never propagate out of the destructor, just like they never propagate out of the exception handler today.

Comment by Andy Schwerin [ 28/Jun/18 ]

I'm nervous about the refresh in the destructor. Why can't it timeout or get interrupted?

Comment by Kaloian Manassiev [ 27/Jun/18 ]

Since all StaleConfig exceptions originate in either CollectionShardingState's or DatabaseShardingState's version checking logic, I propose that we use the OperationShardingState as a way to convey that a certain operation needs to perform shard version refresh and blockage on completion.

The OperationShardingState will contain a boost::optional<Status> _shardingOperationFailedStatus, which will be set to one of the exceptions which require post-command execution actions (for now these are StaleConfig, StaleDbVersion, CannotImplicitlyCreateCollection).

There will be a RAII class ScopedOperationCompletionShardingActions, which in its destructor will inspect the exception (if any) stored in the OperationShardingState and will perform the necessary refresh. These operations currently never need to throw, so this destructor will also never throw.

The ScopedOperationCompletionShardingActions will be instantiated in the beginning of ServiceEntryPointCommon::handleRequest in order to ensure that it covers all requests.

The destructor of ~ScopedOperationCompletionShardingActions will serve as the only place where "sharding refresh" actions will be executed.

schwerin, mira.carey@mongodb.com, renctan - how does this sound?

Generated at Thu Feb 08 04:41:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.