[SERVER-29961] Close change notification cursors when a chunk migrates to a new shard Created: 03/Jul/17  Updated: 30/Oct/23  Resolved: 13/Sep/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.6.0-rc0

Type: Task Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Siyuan Zhou
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-29143 Support killing active $changeNotific... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2017-08-21, Repl 2017-09-11, Repl 2017-10-02
Participants:

 Description   

When a new shard is added to the cluster, any existing change notification cursors won't have corresponding cursors on that shard, so if chunks migrate to that shard future updates to that data could be missed by existing change notification cursors.

To protect against this, any time a shard donates a chunk to another shard, it will first check its copy of the routing table to determine if the destination shard has any chunks for that collection. If not, then it will close all of its own change notification cursors on that collection. This will cause the mongos to close its corresponding cluster change notification cursor, which will in turn cause the driver to retry the change notification request, at which point the new change notification cursor will target all existing shards and include the newly added shard



 Comments   
Comment by Ramon Fernandez Marina [ 13/Sep/17 ]

Author:

{'username': u'visualzhou', 'name': u'Siyuan Zhou', 'email': u'siyuan.zhou@mongodb.com'}

Message:SERVER-29961 Close change notification cursors when a chunk migrates to a new shard
Branch:master
https://github.com/mongodb/mongo/commit/c1c269079f892c2b554491997eb6d9114d890838

Comment by Githook User [ 24/Aug/17 ]

Author:

{'username': 'visualzhou', 'email': 'siyuan.zhou@mongodb.com', 'name': 'Siyuan Zhou'}

Message: SERVER-29961 Add OR BSON helper and use it for change streams.
Branch: master
https://github.com/mongodb/mongo/commit/7faaefa4975c2e3ec43e28e38028d9b5215e5978

Comment by Siyuan Zhou [ 07/Aug/17 ]

Making the behavior work on secondaries is challenging because only the primary of the donor coordinates chuck migrations. Another subtle problem is when to close the cursor. If we don't close the cursor until the donor updates the routing table on the config server, there might be new updates on the recipient that will be missed when resuming with the resume token returned by the donor.

As discussed with spencer and schwerin, we believe this issue will be fixed by two-phase commit since chunk migration will be a transaction involving the donor, the recipient and the config server. We will be able to close the cursor and return the resume token of the commit time.

However, without two-phase commit in 3.6, we need to log a no-op message after entering the critical section and before updating the config server. Thus the cluster time of this no-op will be after the cluster time of adding a new shard and before the commit time of chunk migration on the config server due to causal consistency.

The no-op message is only logged when the recipient doesn't have any data of the collection before. The proposed no-op message has the following format:

{
	"ts" : Timestamp(1495059458, 1),
	"t" : NumberLong(1),
	"h" : NumberLong("5443134971991864482"),
	"v" : 2,
	"op" : "n",
	"ns" : "",
	"o" : {
		"msg" : "migrating chunks to a new shard of the collection",
		"ns" : "test.foo"
	}
}

For back-compatibility, we cannot use the "ns" field because it's expected to be empty for no-ops.

Comment by Spencer Brody (Inactive) [ 24/Jul/17 ]

The problem outlined in my last comment can be fixed by double-checking the set of all shards (by querying the config servers) after establishing cursors on all previously know-about shards, and retrying if any shards have been added.

Comment by Spencer Brody (Inactive) [ 03/Jul/17 ]

On second thought this approach won't work. There is a race where when establishing a new change notification cursor on a sharded collection a mongos could first get the list of all shards. After getting that list, a shard could be added and a chunk migrated to that shard. If it’s only after that point that the mongos goes on to establish cursors on all shards you can wind up in a state where there is a cursor on all shards but the new one, and the new shard already has a chunk so the system will never notice that it is missing notifications from that shard.

Generated at Thu Feb 08 04:22:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.