[SERVER-45352] enforceLegacyWriteConcern send all the requests to the primaries Created: 03/Jan/20  Updated: 11/Mar/20  Resolved: 11/Mar/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.2.2, 4.3.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Randolph Tan
Resolution: Won't Fix Votes: 0
Labels: sharding-4.4-stabilization, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-44732 Remove requires_fcv_44 tag on views_v... Closed
is related to SERVER-43719 Make RWCDefaults initialise from pers... Closed
Operating System: ALL
Backport Requested:
v4.4
Sprint: Sharding 2020-03-23
Participants:
Linked BF Score: 16

 Description   

During ShardingTest initialization mongos tries to enforce LegacyWriteConcern by sending a GetLastError command on all the shards. The list of target to send the GLE to is constructed through getPrevHostOpTimes(). It is the case that the list contains two replica node of the same shard and that they have a different electionID. Thus two different GLE requests are generated for two different node with two different electionID.

[js_test:gle_sharded_write] 2019-12-17T20:06:14.913+0000 s20775| 2019-12-17T20:06:14.911+0000 D3 SHARDING [conn8] enforcing write concern { getLastError: "settings", w: "majority", wtimeout: 30000.0, readConcern: {} } on ip-10-122-3-53:20772 at opTime Dec 17 20:06:14:8 with electionID 7fffffff0000000000000001
[js_test:gle_sharded_write] 2019-12-17T20:06:14.913+0000 s20775| 2019-12-17T20:06:14.911+0000 D3 SHARDING [conn8] enforcing write concern { getLastError: "settings", w: "majority", wtimeout: 30000.0, readConcern: {} } on ip-10-122-3-53:20774 at opTime Jan  1 00:00:00:0 with electionID 000000000000000000000000

And as you can see they have a different electionID.

Then the generated requests are sent throw the MultiStatementTransactionRequestsSender, but since we specified ReadPreference::PrimaryOnly the two requests will be sent to the primary of the shards but with two different electionID.

[js_test:gle_sharded_write] 2019-12-17T20:06:14.913+0000 s20775| 2019-12-17T20:06:14.911+0000 D3 ASIO     [conn8] startCommand: RemoteCommand 23 -- target:[ip-10-122-3-53:20772] db:config cmd:{ getLastError: "settings", w: "majority", wtimeout: 30000.0, readConcern: {}, wOpTime: { ts: Timestamp(1576613174, 8), t: 1 }, wElectionId: ObjectId('7fffffff0000000000000001') }
 
[js_test:gle_sharded_write] 2019-12-17T20:06:14.913+0000 s20775| 2019-12-17T20:06:14.911+0000 D3 ASIO     [conn8] startCommand: RemoteCommand 24 -- target:[ip-10-122-3-53:20772] db:config cmd:{ getLastError: "settings", w: "majority", wtimeout: 30000.0, readConcern: {}, wOpTime: { ts: Timestamp(0, 0), t: -1 }, wElectionId: ObjectId('000000000000000000000000') }

When the primary receives the requests, it will find out that the electionID doesn't match its own and will fail.



 Comments   
Comment by Randolph Tan [ 11/Mar/20 ]

I think current implementation of enforceLegacyWriteConcern is correct. For some reason, the secondary host was registered while trying to call getDefaultRWConcern, this is incorrect behavior since only host that performs writes should be added to the host list. However, I think this is made harder to repro after SERVER-43719 was introduced (you can see that the BFG stopped after it was pushed) and completely gone away with SERVER-44978 (since it no longer uses the request's client to fetch the default RWConcern).

Comment by Randolph Tan [ 11/Mar/20 ]

Note: in the test failure example, both of the host are replica set members of the config server. The update request touched these 2 hosts while trying to update config.settings: The actual write on the primary, and calls to getDefaultRWConcern and queries to config.collections on the primary.

Generated at Thu Feb 08 05:08:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.