[SERVER-41005] Sharding initialization should not occur before replication recovery Created: 03/May/19 Updated: 06/Dec/22 Resolved: 21/Feb/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 16 | ||||||||||||
| Description |
|
Consider the startup sequence of a mongod which is both a replica set member and a shard server (i.e. a member in a replica set, where the replica set is a shard in a sharded cluster). When a node starts up with the --shardsvr flag, it needs to initialize some aspects of the sharding system. In particular, it needs to read a document out of the admin.system.version collection with _id:"shardIdentity" in order to establish things like its shard id and the config server connection string. This node also needs to perform some initialization tasks for the replication subsystem, in particular replication recovery. This involves replaying oplog in order to ensure that the collection and indices are consistent with all committed writes in the oplog before the node services queries. The problem observed in this ticket is that sharding initialization takes place prior to replication recovery. Therefore, the sharding system may attempt to perform reads, at least reads against admin.system.version, before replication recovery has occurred. Sharding initialization could therefore fail to see committed data. For example, it could fail to see the shard identity document, even though the write of the shard identity document was committed. |
| Comments |
| Comment by Githook User [ 26/Jun/19 ] |
|
Author: {'name': 'Jason Carey', 'email': 'jcarey@argv.me', 'username': 'hanumantmk'}Message: We currently spin up sharding in advance of replication (see It's going to be quite difficult to untangle all the dependencies So let's fix up that test by waiting to flush the oplog before shutting (cherry picked from commit 303adb5e50eb02d077b734aa27ae8d02a781d7a2) |
| Comment by Githook User [ 19/Jun/19 ] |
|
Author: {'name': 'Jason Carey', 'email': 'jcarey@argv.me', 'username': 'hanumantmk'}Message: We currently spin up sharding in advance of replication (see It's going to be quite difficult to untangle all the dependencies So let's fix up that test by waiting to flush the oplog before shutting |