[SERVER-41799] Update shard_aware_init to flush its shard identity before shutdown Created: 17/Jun/19  Updated: 29/Oct/23  Resolved: 19/Jun/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.2.0-rc3, 4.3.1

Type: Bug Priority: Major - P3
Reporter: Mira Carey Assignee: Mira Carey
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-41005 Sharding initialization should not oc... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Service Arch 2019-07-01
Participants:
Linked BF Score: 16

 Description   

We currently spin up sharding in advance of replication (see SERVER-41005). Because of that, it is possible for sharding to miss out on certain writes on startup (writes to admin.system.version that are still in the oplog and haven't yet been recovered).

It's going to be quite difficult to untangle all the dependencies between sharding and replication, and in the mean while shard_aware_init has more failures than we'd like. See BF-12759. That particular test specifically checks that corrupting our version (via a manual update to admin.system.version) causes mongod to crash on startup. The problem is that because we start sharding before replication (and also do a complicated dance of restarting in standalone mode to corrupt the document), we can perform an update when the document we want to modify isn't present (because it's still in the oplog and we're in standalone mode), and then fail to crash on startup.

So let's fix up that test by waiting to flush the oplog before shutting down the node (when in replica set mode).



 Comments   
Comment by Githook User [ 26/Jun/19 ]

Author:

{'name': 'Jason Carey', 'email': 'jcarey@argv.me', 'username': 'hanumantmk'}

Message: SERVER-41799 await stable TS in shard_aware_init

We currently spin up sharding in advance of replication (see
SERVER-41005). Because of that, it is possible for sharding to miss out
on certain writes on startup (writes to admin.system.version that are
still in the oplog and haven't yet been recovered).

It's going to be quite difficult to untangle all the dependencies
between sharding and replication, and in the mean while shard_aware_init
has more failures than we'd like. See BF-12759. That particular test
specifically checks that corrupting our version (via a manual update to
admin.system.version) causes mongod to crash on startup. The problem is
that because we start sharding before replication (and also do a
complicated dance of restarting in standalone mode to corrupt the
document), we can perform an update when the document we want to modify
isn't present (because it's still in the oplog and we're in standalone
mode), and then fail to crash on startup.

So let's fix up that test by waiting to flush the oplog before shutting
down the node (when in replica set mode).

(cherry picked from commit 303adb5e50eb02d077b734aa27ae8d02a781d7a2)
Branch: v4.2
https://github.com/mongodb/mongo/commit/22c05b4a4fcc7b0213041067bd9539db9d4da8f5

Comment by Githook User [ 19/Jun/19 ]

Author:

{'name': 'Jason Carey', 'email': 'jcarey@argv.me', 'username': 'hanumantmk'}

Message: SERVER-41799 await stable TS in shard_aware_init

We currently spin up sharding in advance of replication (see
SERVER-41005). Because of that, it is possible for sharding to miss out
on certain writes on startup (writes to admin.system.version that are
still in the oplog and haven't yet been recovered).

It's going to be quite difficult to untangle all the dependencies
between sharding and replication, and in the mean while shard_aware_init
has more failures than we'd like. See BF-12759. That particular test
specifically checks that corrupting our version (via a manual update to
admin.system.version) causes mongod to crash on startup. The problem is
that because we start sharding before replication (and also do a
complicated dance of restarting in standalone mode to corrupt the
document), we can perform an update when the document we want to modify
isn't present (because it's still in the oplog and we're in standalone
mode), and then fail to crash on startup.

So let's fix up that test by waiting to flush the oplog before shutting
down the node (when in replica set mode).
Branch: master
https://github.com/mongodb/mongo/commit/303adb5e50eb02d077b734aa27ae8d02a781d7a2

Generated at Thu Feb 08 04:58:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.