Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.2.6, 3.6.18, 4.4.0-rc0, 4.0.18, 4.7.0
Affects Version/s: None
Component/s: Sharding
Labels:
- sharding-wfbf-day

Backwards Compatibility:
Fully Compatible
Backport Requested:

v4.4, v4.2, v4.0, v3.6
Sprint:
Sharding 2020-04-20
Linked BF Score:
12
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

This can occur during Replica Set primary election:

[js_test:mapReduce_nonSharded] 2017-03-29T17:21:06.376+0000 c20512| 2017-03-29T17:21:06.375+0000 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
[js_test:mapReduce_nonSharded] 2017-03-29T17:21:06.381+0000 c20512| 2017-03-29T17:21:06.381+0000 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 1
[js_test:mapReduce_nonSharded] 2017-03-29T17:21:06.381+0000 c20512| 2017-03-29T17:21:06.381+0000 I REPL     [ReplicationExecutor] transition to PRIMARY
[js_test:mapReduce_nonSharded] 2017-03-29T17:21:06.381+0000 c20512| 2017-03-29T17:21:06.381+0000 D REPL_HB  [ReplicationExecutor] Cancelling all heartbeats.
[js_test:mapReduce_nonSharded] 2017-03-29T17:21:06.382+0000 c20512| 2017-03-29T17:21:06.381+0000 I REPL     [ReplicationExecutor] Could not access any nodes within timeout when checking for additional ops to apply before finishing transition to primary. Will move forward with becoming primary anyway.

Followed by

[js_test:mapReduce_nonSharded] 2017-03-29T17:21:18.428+0000 c20512| 2017-03-29T17:21:18.428+0000 I REPL     [ReplicationExecutor] can't see a majority of the set, relinquishing primary
[js_test:mapReduce_nonSharded] 2017-03-29T17:21:18.428+0000 c20512| 2017-03-29T17:21:18.428+0000 I REPL     [ReplicationExecutor] Stepping down from primary in response to heartbeat
[js_test:mapReduce_nonSharded] 2017-03-29T17:21:18.428+0000 c20512| 2017-03-29T17:21:18.428+0000 I REPL     [replExecDBWorker-0] transition to SECONDARY

roughly 10 seconds later.

ShardingTest initialization does writes here and here that can fail if the config or shard steps down respectively.

awaitNodesAgreeOnPrimary() seems appropriate to call before doing necessary config server writes.

The shard writes (in the link above) does not appear to actually be necessary. It looks like the write followed by awaitReplication() was just a hack to wait for the replica set to finish setting up (https://github.com/mongodb/mongo/commit/69207258a19b68fbbbc1377f61b00a9124d78c90#diff-bcff736df8cd6507ce081e800d9d1dd0R168). In which case, we can just remove the write, which is outdated and not longer useful, and replace awaitSecondaryNodes with awaitNodesAgreeOnPrimary(), which is a lot more obvious of the intent and has the same effect.

Assignee:: Jack Mulrow
Reporter:: Dianna Hohensee (Inactive)
Participants:: Dianna Hohensee, Githook User, Jack Mulrow
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: May 12 2017 02:13:59 PM UTC
Updated:: Oct 30 2023 11:16:55 PM UTC
Resolved:: Apr 03 2020 08:03:48 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates