[SERVER-73848] Hashed shard keys with zones can cause issues with resharding Created: 09/Feb/23  Updated: 29/Oct/23  Resolved: 08/May/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.0.0, 6.0.0, 6.3.0-rc0
Fix Version/s: 7.1.0-rc0, 6.0.7, 5.0.19, 7.0.0-rc3

Type: Bug Priority: Major - P3
Reporter: Kshitij Gupta Assignee: Kruti Shah
Resolution: Fixed Votes: 0
Labels: neweng, sharding-nyc-subteam3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-53432 Ensure that comparing resharding requ... Closed
is related to SERVER-73763 Resharding does not extend zone range... Closed
is related to SERVER-76988 Abort the reshardCollection operation... Closed
is related to SERVER-58433 ReshardingCoordinatorService Transact... Closed
Assigned Teams:
Sharding NYC
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Requested:
v7.0, v6.0, v5.0
Sprint: Sharding NYC 2023-05-15
Participants:
Story Points: 3

 Description   

1. The new zone range can conflict with the old zone range. If a user reshards from a hashed shard key to the same shard key but unhashed, there could be extraneous conflicting documents in the config.tags collection. This could cause issues with the balancer.

To reproduce:

const existingZoneName = 'x1';
assert.commandWorked(
    st.s.adminCommand({addShardToZone: st.shard1.shardName, zone: existingZoneName}));
 
assert.commandWorked(st.s.adminCommand({
    updateZoneKeyRange: ns,
    min: {oldKey: NumberLong("4470791281878691347")},
    max: {oldKey: NumberLong("7766103514953448109")},
    zone: existingZoneName
}));
 
assert.commandWorked(mongos.adminCommand({
    reshardCollection: ns,
    key: {oldKey: 1},
    unique: false,
    collation: {locale: 'simple'},
    zones: [{
        zone: existingZoneName,
        min: {oldKey: NumberLong("4470791281878691346")},
        max: {oldKey: NumberLong("7766103514953448108")}
    }],
    numInitialChunks: 2,
}));

^This will cause the old zone range and new zone range to both still exist post-resharding. We don't delete the previous zones because the hashed and unhashed shard key have the same shard key shape.

 

2. If the zone ranges are the same for the new and old shard key, we will see a duplicate key error like SERVER-73763



 Comments   
Comment by Githook User [ 06/Jun/23 ]

Author:

{'name': 'Kruti Shah', 'email': 'kruti139@gmail.com', 'username': 'krutishah139'}

Message: SERVER-73848 Update config.tags inside writeDecisionPersistedState

(cherry picked from commit 74383d5529c531973e6a53dbe248b0b67823da14)
Branch: v5.0
https://github.com/mongodb/mongo/commit/0338f8e2aafc13bd7100e8d0b706b8402acc190b

Comment by Githook User [ 06/Jun/23 ]

Author:

{'name': 'Kruti Shah', 'email': 'kruti139@gmail.com', 'username': 'krutishah139'}

Message: SERVER-73848 update config.tags inside writeDecisionPersistedState

(cherry picked from commit 5ddce0f5ecb5a199fc3e9ba3b94c6b630a5b7b6d)
Branch: v5.0
https://github.com/mongodb/mongo/commit/87b0531deba80a2730b9f4aaec5e863f3ca952b6

Comment by Githook User [ 31/May/23 ]

Author:

{'name': 'Kruti Shah', 'email': 'kruti139@gmail.com', 'username': 'krutishah139'}

Message: SERVER-73848 Update config.tags inside writeDecisionPersistedState

(cherry picked from commit 74383d5529c531973e6a53dbe248b0b67823da14)
Branch: v6.0
https://github.com/mongodb/mongo/commit/bf65fe9e843462f44d764cef04303d547761465c

Comment by Githook User [ 31/May/23 ]

Author:

{'name': 'Kruti Shah', 'email': 'kruti139@gmail.com', 'username': 'krutishah139'}

Message: SERVER-73848 update config.tags inside writeDecisionPersistedState

(cherry picked from commit 5ddce0f5ecb5a199fc3e9ba3b94c6b630a5b7b6d)
Branch: v6.0
https://github.com/mongodb/mongo/commit/84055336dc6ab149a3e9275f2dee9677e5cdf9fe

Comment by Githook User [ 30/May/23 ]

Author:

{'name': 'Kruti Shah', 'email': 'kruti139@gmail.com', 'username': 'krutishah139'}

Message: SERVER-73848 Update config.tags inside writeDecisionPersistedState

(cherry picked from commit 74383d5529c531973e6a53dbe248b0b67823da14)
Branch: v7.0
https://github.com/mongodb/mongo/commit/64fd142fa7374d3296c6c204ba2b7892ddeaacae

Comment by Githook User [ 30/May/23 ]

Author:

{'name': 'Kruti Shah', 'email': 'kruti139@gmail.com', 'username': 'krutishah139'}

Message: SERVER-73848 update config.tags inside writeDecisionPersistedState

(cherry picked from commit 5ddce0f5ecb5a199fc3e9ba3b94c6b630a5b7b6d)
Branch: v7.0
https://github.com/mongodb/mongo/commit/7e27188843967b0c5666dc544d741f1339cc61e0

Comment by Githook User [ 08/May/23 ]

Author:

{'name': 'Kruti Shah', 'email': 'kruti139@gmail.com', 'username': 'krutishah139'}

Message: SERVER-73848 Update config.tags inside writeDecisionPersistedState
Branch: master
https://github.com/mongodb/mongo/commit/74383d5529c531973e6a53dbe248b0b67823da14

Comment by Githook User [ 14/Apr/23 ]

Author:

{'name': 'Kruti Shah', 'email': 'kruti139@gmail.com', 'username': 'krutishah139'}

Message: SERVER-73848 update config.tags inside writeDecisionPersistedState
Branch: master
https://github.com/mongodb/mongo/commit/5ddce0f5ecb5a199fc3e9ba3b94c6b630a5b7b6d

Comment by Max Hirschhorn [ 29/Mar/23 ]

The core idea of SERVER-73848 will be to change the ReshardingCoordinatorService to atomically do the following writes in its writeDecisionPersistedState() function:

  1. Update the config.reshardingOperations state document to CoordinatorStateEnum::kCommitting.
  2. Delete the config.collections entry for the temporary resharding namespace.
  3. Update the config.collections entry for the user collection namespace to refer to the collection UUID, new shard key pattern, etc.
  4. (Changed) Delete all of the config.tags entries for the user collection namespace.
  5. (Changed) Update all of the config.tags entries for the temporary resharding namespace to refer to the user collection namespace.

(There are technically some other writes to config.csrs.indexes and config.placementHistory which also happen in this same replica set transaction but I'm going to gloss over those.)

The cleanupSourceConfigCollections() function which runs after the resharding operation has committed would be changed to only delete the config.chunks entries for the original collection because the config.tags entries for the original collection would already have been deleted by #4 and the config.tags entries for the new collection are in their place.

The updateTagsDocsForTempNss() function would be changed to run as part of the replica set transaction as #5.

Now I'll summarize some additional context on it is expected to be safe to make these changes described above given the work which originally transpired in SERVER-58433. The jstests/sharding/resharding_large_number_of_initial_chunks.js test uses a large number of zones to have shardCollection create a large number of chunks. In practice, even with Atlas Global Clusters, the total data size for the zone information for the original collection is well less than 16MB. Notably, the reshardCollection command accepts a zones parameter to support resharding while obeying zones and the command request is limited to <16MB for the zone for the new collection.

  • The large number of zones in the resharding_large_number_of_initial_chunks.js when resharding runs is incidental because the goal is really to test resharding when the routing table is large. We would still test the behavior about resharding in the presence of a large number of config.chunks entries for the original collection and for the new collection even if the config.tags entries for the original collection were deleted after shardCollection was run and before reshardCollection was run in the test.
  • While the 16MB command request size constrains the maximum data size of the config.tags entries for the new collection, there is technically nothing which constrains the data size of the config.tags entries for the original collection. My recommendation would be to have _calculateParticipantsAndChunksThenWriteToDisk() tally the data size (through BSONObj::objsize() directly or equivalent aggregation with $bsonSize) the config.tags entries for the original collection namespace. This would vastly reduce any risk where resharding hits a case where it is unable to commit the operation due to the changes in the storage transaction growing too large for WiredTiger.
Generated at Thu Feb 08 06:25:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.