[SERVER-34760] Retries of _configsvrShardCollection may not send setShardVersion to primary shard Created: 30/Apr/18  Updated: 29/Oct/23  Resolved: 13/Sep/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.6.15

Type: Bug Priority: Major - P3
Reporter: Jack Mulrow Assignee: Blake Oler
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-34708 Possible for shard to not learn about... Closed
Gantt End to End
Problem/Incident
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2018-08-13, Sharding 2019-07-29, Sharding 2019-08-12, Sharding 2019-08-26, Sharding 2019-09-09, Sharding 2019-09-23
Participants:
Linked BF Score: 31

 Description   

_configsvrShardCollection majority updates the config.collections collection to mark a collection as sharded then sends setShardVersion to the primary shard for the collection's database so it knows to refresh. If the write is interrupted by a retryable error (like a stepdown), the command can fail, but the update may still have been locally committed and/or partially replicated to the new primary. Then when the command is retried by mongos, the config server can see the write has been locally committed and return early, skipping the setShardVersion call, allowing the command to complete successfully without the primary shard knowing the collection has been sharded.



 Comments   
Comment by Blake Oler [ 13/Sep/19 ]

After talking with esha.maharishi, we've decided that the build failure load related to dropCollection on 3.6 isn't significant enough to backport SERVER-33973 or do further work under this ticket. We're closing this with only the shardCollection work.

Comment by Githook User [ 12/Sep/19 ]

Author:

{'name': 'Blake Oler', 'username': 'BlakeIsBlake', 'email': 'blake.oler@mongodb.com'}

Message: SERVER-34760 Send setShardVersion on retries of shardCollection
Branch: v3.6
https://github.com/mongodb/mongo/commit/06d398fb3a9ade0bfd9e4a580015cc4f491089ce

Comment by Blake Oler [ 20/Aug/19 ]

alyson.cabral this will just imply that the shard will not know that the collection is sharded after a successful return. AFAIK the shard will know once it receives a versioned command.

kaloian.manassiev That is correct, this issue only exists on 3.6 and earlier due to the config server driving the shardCollection command. We will only be increasing the time on retries in versions 3.6 and earlier.

Comment by Kaloian Manassiev [ 14/Aug/19 ]

blake.oler:

Would we be open to increasing time for shardCollection as well, by always sending setShardVersion?

During the normal execution of the command we always send SetShardVersion to the primary shard, so the only situation where the time would be increased is if there was a failure and a retry, right? If this is the case then I don't see a problem with doing it, however is this still the case with the move of the shard collection main logic to the primary shard? Because the primary shard effectively does exactly what SSV does (syncing from the config server). Is this only for 3.6 and earlier?

Comment by Alyson Cabral (Inactive) [ 10/Aug/19 ]

Is the result of this behavior today, that we can send a success on the shard collection command but the collection isn't actually sharded? Is there any other weird behavior?

Comment by Blake Oler [ 17/Jun/19 ]

alyson.cabral, it seems that we have solved this issue for dropCollection with SERVER-33973. Would we be open to increasing time for shardCollection as well, by always sending setShardVersion?

Comment by Jack Mulrow [ 13/Aug/18 ]

The most frequent BFs linked to this ticket are from the continuous config stepdown suite, and because most of the logic for shardCollection was moved to the primary shard by SERVER-35722, those failures should have gone away. The bug for shardCollection does still exist if the primary shard steps down at the wrong time, but we don't have as much test coverage of that so we won't see as many test failures.

kaloian.manassiev, should this ticket still be a priority now that SERVER-35722 has been finished? That ticket has a 4.0 backport request, but the BFs linked to this ticket also happen on 3.6, so should we consider doing something only on the 3.6 branch instead?

Comment by Kaloian Manassiev [ 11/May/18 ]

We'd have to blacklist every test that calls shardCollection or dropCollection and expects the shard to have become aware.

That's why I filed SERVER-32558. At this point the config server stepdown suite has become a nuisance. Our metadata operations have known deficiencies where they are not idempotent and we know of cases where manual intervention is necessary for cleanup and yet we continue to test these operations' idempotency.

Comment by Esha Maharishi (Inactive) [ 11/May/18 ]

We'd have to blacklist every test that calls shardCollection or dropCollection and expects the shard to have become aware... this seems like a lot :/

Another option is to override shardCollection and dropCollection in the shell to send _flushRoutingTableCacheUpdates directly to the shard afterwards, which simulates that the setShardVersion was sent.

Comment by Kaloian Manassiev [ 11/May/18 ]

This looks like a general non-retryability problem, just like _configsvrMovePrimary. I think we should just blacklist these tests.

Comment by Esha Maharishi (Inactive) [ 30/Apr/18 ]

This is also true for _configsvrDropCollection.

I don't know if the correct way to address it is to make these two commands send setShardVersion on retries.

It will certainly reduce the likelihood that the setShardVersion is not sent, but the distributed catalog can still end up in a stale where the collection entry exists on the config server, but setShardVersion was never sent to the shard (e.g., mongos exhausts its retries in the face of repeated config stepdowns).

Generated at Thu Feb 08 04:37:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.