Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17845

Running the dropDatabase command can cause an election to occur

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.0.0
    • Component/s: Replication
    • None
    • Fully Compatible
    • Windows
    • Security 1 04/03/15

      We are encountering unexpected elections when running the .NET driver unit tests against a replica set (with auth and ssl if that makes a difference).

      Apparently running the dropDatabase command can cause an unexpected election to occur.

      From the attached logs:

      On member 27017:

      2015-04-01T16:22:18.013+0000 D REPL     [ReplicationExecutor] localhost:27018 thinks that we are down because they cannot send us heartbeats.
      2015-04-01T16:22:18.013+0000 I REPL     [ReplicationExecutor] Member localhost:27018 is now in state PRIMARY
      2015-04-01T16:22:18.013+0000 I REPL     [ReplicationExecutor] replset: another primary seen with election time 551c1b39:1 my election time is 551c1ab4:1
      2015-04-01T16:22:18.013+0000 I REPL     [ReplicationExecutor] stepping down; another primary was elected more recently
      2015-04-01T16:22:18.013+0000 D REPL     [ReplicationExecutor] Scheduling heartbeat to localhost:27018 at 2015-04-01T16:22:20.013Z
      2015-04-01T16:22:18.013+0000 I REPL     [ReplicationExecutor] Stepping down from primary in response to heartbeat
      

      And on member 27018:

      2015-04-01T16:22:16.553+0000 I NETWORK  [ReplExecNetThread-0] Socket recv() timeout  127.0.0.1:27017
      2015-04-01T16:22:16.553+0000 I NETWORK  [ReplExecNetThread-0] SocketException: remote: 127.0.0.1:27017 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.0.1:27017] 
      2015-04-01T16:22:16.553+0000 I NETWORK  [ReplExecNetThread-0] DBClientCursor::init call() failed
      2015-04-01T16:22:16.553+0000 D -        [ReplExecNetThread-0] User Assertion: 10276:DBClientBase::findN: transport error: localhost:27017 ns: local.$cmd query: { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D534D4E376A716762326245447A315847506C4D2B50695A6E2B6E627338315834) }
      2015-04-01T16:22:16.740+0000 D REPL     [ReplExecNetThread-0] Network status of sending replSetHeartbeat to localhost:27017 was Location10276 DBClientBase::findN: transport error: localhost:27017 ns: local.$cmd query: { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D534D4E376A716762326245447A315847506C4D2B50695A6E2B6E627338315834) }
      2015-04-01T16:22:16.740+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:27017; Location10276 DBClientBase::findN: transport error: localhost:27017 ns: local.$cmd query: { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: BinData(0, 6E2C2C6E3D5F5F73797374656D2C723D534D4E376A716762326245447A315847506C4D2B50695A6E2B6E627338315834) }
      2015-04-01T16:22:16.740+0000 D REPL     [ReplicationExecutor] Scheduling heartbeat to localhost:27017 at 2015-04-01T16:22:18.740Z
      2015-04-01T16:22:16.740+0000 I REPL     [ReplicationExecutor] Standing for election
      ...
      2015-04-01T16:22:17.209+0000 I REPL     [ReplicationExecutor] replSet election succeeded, assuming primary role
      2015-04-01T16:22:17.209+0000 I REPL     [ReplicationExecutor] transition to PRIMARY
      

      So 27018 has taken the role of primary away from 27017.

      Here's what seems to be bogging member 27017 down so much that 27018 concluded it was down:

      2015-04-01T16:21:59.390+0000 D COMMAND  [conn27] run command Tests04011621.$cmd { dropDatabase: 1 }
      2015-04-01T16:21:59.390+0000 I COMMAND  [conn27] dropDatabase Tests04011621 starting
      2015-04-01T16:21:59.390+0000 D STORAGE  [conn27] dropDatabase Tests04011621
      ...
      2015-04-01T16:22:18.777+0000 I JOURNAL  [conn27] journalCleanup...
      2015-04-01T16:22:18.778+0000 I JOURNAL  [conn27] removeJournalFiles
      2015-04-01T16:22:18.778+0000 I STORAGE  [DataFileSync] flushing mmaps took 17177ms  for 10 files
      2015-04-01T16:22:18.814+0000 D JOURNAL  [conn27] removeJournalFiles end
      2015-04-01T16:22:19.019+0000 I JOURNAL  [conn27] journalCleanup...
      2015-04-01T16:22:19.019+0000 I JOURNAL  [conn27] removeJournalFiles
      2015-04-01T16:22:19.019+0000 D JOURNAL  [conn27] removeJournalFiles end
      2015-04-01T16:22:19.047+0000 D STORAGE  [conn27] remove file D:\jenkins\workspace\mongo-csharp-driver-test-windows\label\windows64\mc\replica_set\mo\auth\ms\30-release\mssl\ssl\artifacts\data\db27017\Tests04011621.ns
      2015-04-01T16:22:19.156+0000 I COMMAND  [conn27] dropDatabase Tests04011621 finished
      2015-04-01T16:22:19.174+0000 I COMMAND  [conn27] command Tests04011621.$cmd command: dropDatabase { dropDatabase: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:64 locks:{} 19784ms
      

        1. 27019.log
          145 kB
        2. db27017.log
          23.76 MB
        3. db27018.log
          3.49 MB

            Assignee:
            amalia.hawkins@10gen.com Amalia Hawkins
            Reporter:
            robert@mongodb.com Robert Stam
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: