Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40336

ReplicationCoordinatorImpl::_random isn't robust to replica set members being started at the same time

    • Fully Compatible
    • ALL
    • v4.0, v3.6, v3.4
    • Repl 2019-04-08, Repl 2019-04-22
    • 0

      We've observed multiple cases in the sys-perf and sys-perf-4.0 Evergreen projects where a 2-node replica set, 2-shard cluster is restarted and one of the replica set shards fails to elect a member as primary after 11 attempts spanning ~2 minutes. Both nodes in the 2-node replica set had run for election at the same time repeatedly and consistently encountered a situation where each node had already voted for itself in that term. While random jitter is added to the election timeout, it is based on a PseudoRandom that is seeded with the current time on startup. The performance infrastructure spawns mongod processes concurrently and appears to end up in situations where the time on startup and thus the seed for ReplicationCoordinatorImpl::_random is the same.

            siyuan.zhou@mongodb.com Siyuan Zhou
            max.hirschhorn@mongodb.com Max Hirschhorn
            0 Vote for this issue
            6 Start watching this issue