[SERVER-68495] Resharding a collection with a very large number of zones configured may stall on config server primary indefinitely Created: 02/Aug/22  Updated: 29/Oct/23  Resolved: 08/Aug/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.0.4, 6.0.0
Fix Version/s: 5.0.11, 6.0.2, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-61035 Increase zones in 'resharding_large_n... Closed
Related
is related to SERVER-58433 ReshardingCoordinatorService Transact... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.0
Sprint: Sharding 2022-08-08
Participants:
Story Points: 2

 Description   

The changes from b15ce76 as part of SERVER-58433 didn't remove the multi-update to the config.tags collection from the replica set transaction on the config server primary for transitioning to CoordinatorStateEnum::kCommitting. The replica set transaction may therefore repeatedly failed with a WriteConflict due to the storage transaction being aborted by WiredTiger cache eviction when the multi-update affects a very large number of documents.

[js_test:resharding_large_number_of_initial_chunks] c20042| 2022-08-01T16:55:28.987+00:00 I  WRITE    51803   [ReshardingCoordinatorService-2] "Slow query","attr":{"type":"update","ns":"config.tags","command":{"q":{"ns":"db.system.resharding.efb50f47-9623-4443-9f82-6ef1d7fe4c58"},"u":{"$set":{"ns":"db.foo"}},"hint":{"ns":1,"min":1},"multi":true,"upsert":false},"planSummary":"IXSCAN { ns: 1, min: 1 }","keysInserted":268146,"keysDeleted":268146,"numYields":0,"queryHash":"8B227D4B","ok":0,"errMsg":"-31800: oldest pinned transaction ID rolled back for eviction :: caused by :: WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.","errName":"WriteConflict","errCode":112,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"w":4}},"ReplicationStateTransition":{"acquireCount":{"w":6}},"Global":{"acquireCount":{"w":4}},"Database":{"acquireCount":{"w":3}},"Collection":{"acquireCount":{"w":5}}},"flowControl":{"acquireCount":1,"timeAcquiringMicros":3},"storage":{"data":{"bytesRead":15401422,"timeReadingMicros":7467},"timeWaitingMicros":{"cache":846972}},"durationMillis":17347}
...
[js_test:resharding_large_number_of_initial_chunks] c20042| 2022-08-01T17:04:57.892+00:00 I  WRITE    51803   [ReshardingCoordinatorService-2] "Slow query","attr":{"type":"update","ns":"config.tags","command":{"q":{"ns":"db.system.resharding.efb50f47-9623-4443-9f82-6ef1d7fe4c58"},"u":{"$set":{"ns":"db.foo"}},"hint":{"ns":1,"min":1},"multi":true,"upsert":false},"planSummary":"IXSCAN { ns: 1, min: 1 }","keysInserted":262668,"keysDeleted":262668,"numYields":0,"queryHash":"8B227D4B","ok":0,"errMsg":"-31800: oldest pinned transaction ID rolled back for eviction :: caused by :: WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.","errName":"WriteConflict","errCode":112,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"w":9}},"ReplicationStateTransition":{"acquireCount":{"w":21}},"Global":{"acquireCount":{"w":9}},"Database":{"acquireCount":{"w":8}},"Collection":{"acquireCount":{"w":20}}},"flowControl":{"acquireCount":6,"timeAcquiringMicros":22},"storage":{"data":{"bytesRead":15692190,"timeReadingMicros":14707},"timeWaitingMicros":{"cache":514930}},"durationMillis":114614}

https://evergreen.mongodb.com/lobster/build/4053c7c842f00af9f14444235c8bf1ab/test/62e80502c2ab681a664a7e36#bookmarks=0%2C4928&f~=100~oldest%20pinned%20transaction%20ID%20rolled%20back%20for%20eviction&l=1



 Comments   
Comment by Max Hirschhorn [ 10/Aug/22 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-68495 Update resharded zones after kCommitting transition.

Attempting to update the namespace of the config.tags documents may
otherwise repeatedly fail with a WriteConflict due to WT cache eviction
aborting the transaction.

(cherry picked from commit 41c45d0d0725f3e9745469755fe8beacb24b9c5e)
Branch: v6.0
https://github.com/mongodb/mongo/commit/71c2798d8749dca09a6d4e6d7838b6cdbe218f2d

Comment by Githook User [ 09/Aug/22 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-68495 Update resharded zones after kCommitting transition.

Attempting to update the namespace of the config.tags documents may
otherwise repeatedly fail with a WriteConflict due to WT cache eviction
aborting the transaction.

(cherry picked from commit 41c45d0d0725f3e9745469755fe8beacb24b9c5e)
Branch: v5.0
https://github.com/mongodb/mongo/commit/183c7ba5e53452fb18408f120b1f00ae15d94630

Comment by Githook User [ 04/Aug/22 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-68495 Update resharded zones after kCommitting transition.

Attempting to update the namespace of the config.tags documents may
otherwise repeatedly fail with a WriteConflict due to WT cache eviction
aborting the transaction.
Branch: master
https://github.com/mongodb/mongo/commit/41c45d0d0725f3e9745469755fe8beacb24b9c5e

Generated at Thu Feb 08 06:10:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.