[SERVER-29833] Balancer does not store or send a writeConcern for a recovered migration Created: 23/Jun/17  Updated: 06/Dec/22  Resolved: 10/Jul/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.5.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Sharding
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

The moveChunk command currently supports writeConcern, on mongos, on the config server, and on shards. This writeConcern only really applies to the range deletion on the donor shard (whether the request should wait for the deletes to propagate to a certain number of nodes before returning, when waitForDelete is true).

However, the document persisted for each ongoing migration in config.migrations does not contain the writeConcern option.

So, if the config server primary fails over, the new config primary will attempt to recover and continue all migrations listed in config.migrations. The new config primary will not, however, pass along a writeConcern to the shard (even if the original request contained writeConcern).

This is usually not a problem, since during a config failover, the mongos will retry the moveChunk request against the new config primary, which will cause the new config primary to send a separate moveChunk request to the donor shard (in addition to the one sent as part of recovering the migration). So, the mongos's retried moveChunk will contain the writeConcern.



 Comments   
Comment by Esha Maharishi (Inactive) [ 10/Jul/17 ]

Yes, I'll close this as Won't Fix since it doesn't cause any major issues. It's not actually blocking the BF; it's just where I noticed this.

Comment by Kaloian Manassiev [ 10/Jul/17 ]

The only real meaning of supporting this writeConcern that I can think of is for tests and perhaps for working around the fact that secondary reads do not do filtering. Given with secondaries chunk aware in place now, can we just patch up the test to use that?

Comment by Esha Maharishi (Inactive) [ 10/Jul/17 ]

I mostly agree with dianna.hohensee - the worst that can happen is:

  • mongos sends moveChunk to config server
  • config primary persists moveChunk request in config.migrations
  • config primary steps down
  • mongos gets a network error, exhausts its retries, and reports failure to the client
  • a new config primary is elected
  • new config primary resumes the migration
  • migration proceeds without the original writeConcern, but no client is waiting for it anyway
Comment by Dianna Hohensee (Inactive) [ 23/Jun/17 ]

I don't believe there's a scenario where we need the balancer to send a writeConcern for migration recovery, and that's why we don't do it.

Generated at Thu Feb 08 04:21:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.