[SERVER-68932] Update resharding critical section metrics on writes Created: 18/Aug/22  Updated: 29/Oct/23  Resolved: 16/Sep/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 6.1.1, 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Brett Nawrocki
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-69110 Consolidate stale collection/database... Open
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.1
Sprint: Sharding 2022-09-05, Sharding 2022-09-19
Participants:
Story Points: 3

 Description   

The handling of StaleConfigInfo is performed both on the "read/command" path and on the write path but handleReshardingCriticalSectionMetrics is only called in the in the former.



 Comments   
Comment by Githook User [ 10/Oct/22 ]

Author:

{'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}

Message: SERVER-68932 Fix resharding critical section metrics

(cherry picked from commit 2514cb0721a0df59601f3ff264a9a03d5015db71)
Branch: v6.1
https://github.com/mongodb/mongo/commit/1027a62e3008b6432f381acdcac9f7b48fae43a5

Comment by Githook User [ 16/Sep/22 ]

Author:

{'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}

Message: SERVER-68932 Fix resharding critical section metrics
Branch: master
https://github.com/mongodb/mongo/commit/2514cb0721a0df59601f3ff264a9a03d5015db71

Comment by Max Hirschhorn [ 24/Aug/22 ]

The insert, update, and delete commands flow through write_ops_exec.cpp in mongod and this codepath is the only caller of OperationShardingState::setShardingOperationFailedStatus(). The reason the insert, update, and delete commands behave differently is because they report stale shard versions within their writeErrors array but still respond back with an ok:1 response. Other commands report stale shard versions as a command-level error with an ok:0 response.

Acceptance criteria:

  • Add a call to resharding_metrics::onCriticalSectionError() to ~ScopedOperationCompletionShardingActions().
  • Add a JavaScript test which attempts to have an insert/update/delete command occur in a parallel shell while the critical section to block writes is active. Verify the counter for "countWritesDuringCriticalSection" is appropriately incremented.

Tommaso also filed SERVER-69110 to consolidate the two separate handlings for StaleConfigInfo to avoid bugs like this in the future.

Generated at Thu Feb 08 06:12:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.