[SERVER-48096] PeriodicShardedIndexConsistencyChecker thread on jstests can cause unintended shard refreshes Created: 11/May/20  Updated: 29/Oct/23  Resolved: 29/Jun/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.2.9, 4.4.1, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Blake Oler Assignee: Tommaso Tocci
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-48055 Disable periodic index consistency ch... Closed
Problem/Incident
Related
is related to SERVER-50914 cursor_valid_after_shard_stepdown.js ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2
Sprint: Sharding 2020-06-29
Participants:
Linked BF Score: 44

 Description   

Problem

The presence of the PeriodicShardedIndexConsistencyChecker thread causes unintended refreshes on shards in all config server stepdown suites. Js tests that rely on a shard's metadata being stale can sporadically fail due to this thread running on stepup.

Possible Solutions

  • Auditing all jstests that rely on stale shard metadata and disabling the periodic thread (this has been done before).
  • Disabling the thread altogether on config server stepdown suites, and enabling it on targeted tests that test the behavior of the periodic thread.

Proven Affected Tests



 Comments   
Comment by Githook User [ 18/Aug/20 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-48096 PeriodicShardedIndexConsistencyChecker thread on jstests can cause unintended shard refreshes

(cherry picked from commit e755577b7d01a1442f14a26c995632c3cf6f6b14)
Branch: v4.4
https://github.com/mongodb/mongo/commit/5b36d69aa602d9c45a67c7e0c766823ea66c9711

Comment by Githook User [ 21/Jul/20 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-48096 PeriodicShardedIndexConsistencyChecker thread on jstests can cause unintended shard refreshes

(cherry picked from commit e755577b7d01a1442f14a26c995632c3cf6f6b14)
Branch: v4.2
https://github.com/mongodb/mongo/commit/e8b1b9719d675882758105f116dda2e51c9c7d77

Comment by Githook User [ 29/Jun/20 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-48096 PeriodicShardedIndexConsistencyChecker thread on jstests can cause unintended shard refreshes
Branch: master
https://github.com/mongodb/mongo/commit/e755577b7d01a1442f14a26c995632c3cf6f6b14

Comment by Tommaso Tocci [ 29/Jun/20 ]

cleanup_orphaned_cmd_prereload.js test has been removed in SERVER-47992

Comment by Jack Mulrow [ 28/May/20 ]

max.hirschhorn, I'd slightly prefer not to disable the checker in the config stepdown suites since it shouldn't affect the correctness of tests other than those that assert on the staleness of shards, which is really an implementation detail, and it'd be nice to keep as much coverage as possible for that assumption. I don't expect many tests would need to disable the thread and the ones that do probably use fail points we can grep for, so auditing shouldn't be that bad.

That said, we do have coverage for the checker with config server stepdowns from the concurrency stepdown suites, and I'd be surprised if any concurrency workload relies on shard staleness, so I'd also be fine disabling the checker in just the sharding_csrs_continuous_config_stepdown suite.

Comment by Max Hirschhorn [ 28/May/20 ]

jack.mulrow, do you have thoughts for how we should handle the PeriodicShardedIndexConsistencyChecker? I saw you recently filed SERVER-48055 which looks to be another test that's impacted.

Generated at Thu Feb 08 05:16:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.