[SERVER-48600] RefineCollectionShardKey does not check for transaction write concern errors Created: 04/Jun/20  Updated: 29/Oct/23  Resolved: 24/Aug/20

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: None
Fix Version/s: 4.7.0, 4.4.2

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-48527 Aborting in-progress transactions on ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Sharding 2020-09-07
Participants:

 Description   

It checks for top level errors here but never checks for write concern errors. Thus the command can succeed and then roll back.

Here is an example of checking for both types of errors.



 Comments   
Comment by Githook User [ 17/Sep/20 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-48600 refineCollectionShardKey internal transaction should check for write concern errors

(cherry picked from commit 574976df29afa984cae7e28c5772e71bb0906ec9)
Branch: v4.4
https://github.com/mongodb/mongo/commit/1e14a0f43e32090b5c22b6805ef6ed42d0f9dc20

Comment by Githook User [ 21/Aug/20 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-48600 refineCollectionShardKey internal transaction should check for write concern errors
Branch: master
https://github.com/mongodb/mongo/commit/574976df29afa984cae7e28c5772e71bb0906ec9

Comment by Jack Mulrow [ 04/Jun/20 ]

Actually, the internal transaction runs on a different operation context from a different client than the one servicing the _configsvrRefineCollectionShardKey command, so the writes from the transaction won't be considered when waitForWriteConcern() checks if the command's last op increased when deciding whether to wait for write concern after the command executes.

This isn't a problem because after the transaction completes, the command's original operation context is used to write to the config.changelog collection, which will advance the client's last op and trigger waiting for majority write concern of the logging write, which should be greater than the op time of the transaction. This is pretty fragile, so in addition to checking for write concern where the ticket description linked, we may want to set the last op on the command's client to the last op from the client used for the internal transaction after it commits.

Comment by Jack Mulrow [ 04/Jun/20 ]

Mongos sends _configsvrRefineCollectionShardKey to the config server with majority write concern, so even though that command doesn't check for a write concern error when committing the refine shard key transaction, I don't think a refine can succeed if there was a write concern error because waiting for write concern after the command finishes executing should fail.

It's probably still worth fixing this though since the config server will trigger routing table refreshes on each shard with a chunk for the refined namespace after committing the internal transaction, which is wasted work if the transaction does roll back.

Generated at Thu Feb 08 05:17:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.