Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-25255

After upgrading to 3.2.8 (from 3.0.9) startup appears to hang with lots of disk reads to the local collection.

    • Type: Icon: Question Question
    • Resolution: Incomplete
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.8
    • Component/s: Replication, WiredTiger
    • None

      After upgrading to 3.2.8 (from 3.0.9) startup appears to hang with lots of disk reads to the local collection.

      fatrace shows lots of:

      mongod(87260): R /srv/mongodb/local/collection-2--6617514398526579318.wt

      The file is a ~230gb file which is likely made up of the oplog.

      This all happens after the log lines:

      2016-07-25T13:12:15.345+0000 I STORAGE  [initandlisten] Placing a marker at optime Jul 25 11:21:00:37f
      2016-07-25T13:12:15.345+0000 I STORAGE  [initandlisten] Placing a marker at optime Jul 25 12:55:11:25b
      2016-07-25T13:12:25.296+0000 I NETWORK  [websvr] admin web console waiting for connections on port 28017
      2016-07-25T13:12:25.302+0000 I REPL     [initandlisten] Did not find local voted for document at startup;  NoMatchingDocument: Did not find replica set lastVote document in local.replset.election
      

      and on another run with debug:

      2016-07-25T15:12:49.643+0000 I REPL     [initandlisten] Did not find local voted for document at startup;  NoMatchingDocument: Did not find replica set lastVote document in local.replset.election
      2016-07-25T15:12:49.643+0000 D REPL     [initandlisten] returning minvalid: (term: -1, timestamp: May 24 18:17:53:802)({ ts: Timestamp 1464113873000|2050, t: -1 }) -> (term: -1, timestamp: Jul 25 15:04:29:36)({ ts: Timestamp 1469459069000|54, t: -1 })
      2016-07-25T15:12:49.643+0000 D REPL     [initandlisten] Recovering from a failed apply batch, start:{ ts: Timestamp 1464113873000|2050, t: -1 }
      

      With verbose logging on it just shows WT Journal flushing at a regular rate.

      Why is this happening and how long might it take for the node to come back online?

      Will it try and read the whole collection / op log?

            Assignee:
            Unassigned Unassigned
            Reporter:
            paul.ridgway Paul Ridgway
            Votes:
            1 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: