[SERVER-61461] update_shard_key_doc_moves_shards.js fails due to spurious refreshes from secondaries Created: 12/Nov/21  Updated: 29/Oct/23  Resolved: 02/Dec/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.2.0, 5.0.6, 4.4.11

Type: Bug Priority: Major - P3
Reporter: Luis Osta (Inactive) Assignee: Luis Osta (Inactive)
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-63493 update-shard-key tests failing due to... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0, v4.4
Sprint: Sharding 2021-11-29, Sharding 2021-12-13
Participants:
Linked BF Score: 56
Story Points: 1

 Description   

Background
Mirrored Reads were introduced in 4.4. They "mirror" a read from the primary to the secondary as well. If the read that gets sent to the secondary has an older shard version than it expects, it will send a command to the primary to refresh.

Problem

The test calls refreshCatalogCacheForNs in order to make sure that the nodes are consistently refreshed. While this refreshes the primary, it does not refresh the secondary. This means that, due to mirrored reads, we could get spurious and unexpected refreshes on the primary.

There are a few possible solutions:

  • Make the test not run with mirrored reads (not a good idea IMO)
  • Make the test more resilient to background refreshes (even something as basic as retrying)
  • Make the test refresh the secondaries alongside the primaries
  • Update the refresh logic so that the secondary sends the shardVersion which made the secondary trigger a refresh. This way the primary can check whether or not it needs to refresh
  • To automatically retry the operation using the auto retry logic. Specifically we could use withTxnAndAutoRetryOnMongos


 Comments   
Comment by Githook User [ 02/Dec/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-61461 Increase 'maxTransactionLockRequestTimeoutMillis' for tests

(cherry picked from commit c3402c98def4ce8b25609429ccb9e24fb4fe7cd0)
Branch: v5.0
https://github.com/mongodb/mongo/commit/b93b508db2e180fe8e2ae6284ce9e8ef5b091cec

Comment by Githook User [ 02/Dec/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-61461 Increase 'maxTransactionLockRequestTimeoutMillis' for tests

(cherry picked from commit c3402c98def4ce8b25609429ccb9e24fb4fe7cd0)
Branch: v4.4
https://github.com/mongodb/mongo/commit/d4b800c330d8c713d26b0828fe39e046bac5ba03

Comment by Githook User [ 02/Dec/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-61461 Increase 'maxTransactionLockRequestTimeoutMillis' for tests

(cherry picked from commit c3402c98def4ce8b25609429ccb9e24fb4fe7cd0)
Branch: v5.1
https://github.com/mongodb/mongo/commit/cf88e3f42ac9fd463fc8332961dca61ab9b0c102

Comment by Githook User [ 02/Dec/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-61461 Increase 'maxTransactionLockRequestTimeoutMillis' for tests
Branch: master
https://github.com/mongodb/mongo/commit/c3402c98def4ce8b25609429ccb9e24fb4fe7cd0

Comment by Max Hirschhorn [ 29/Nov/21 ]

After discussing this ticket in our storypointing meeting, it seems like we could instead raise the maxTransactionLockRequestTimeoutMillis server parameter on the shards (analogous to what SERVER-48651 did on the config server) to prevent the spurious shard version refresh triggered by the secondaries from causing the transaction to fail with a LockTimeout error. This way we can continue to have the test client not retry on transient transaction errors. We would need to make such a change to all of

  • jstests/sharding/update_shard_key_doc_moves_shards.js
  • jstests/sharding/update_shard_key_doc_on_same_shard.js
  • jstests/sharding/update_shard_key_pipeline_update.js

which use the jstests/sharding/libs/update_shard_key_helpers.js library and have been observed to fail. (The jstests/sharding/update_compound_shard_key.js test hasn't been observed to fail.)

Generated at Thu Feb 08 05:52:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.