Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28157

Can't start secondary, data corrupted after clean shutdown

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.2
    • Component/s: WiredTiger
    • Labels:
    • Linux

      We have a development DB, 3 member replica set running Mongo DB server 3.4.2 on Amazon Linux (EC2 t2.medium), recently upgraded from 3.2.x.
      rs.status() returned all members were healthy.
      We restarted each of the 3 members to enable a configuration change for logRotate in the following order:
      secondary 3 (hidden:true, priority:0, port:29019),
      secondary 2 (priority:20, port:29018),
      primary (priority 30, port:29017)
      They were each restarted using: service mongod restart with a short pause between each one.
      This database had no clients connected at the time.
      After restarting all 3, in mongo client we issued rs.status() and noticed "primary" and "secondary 2" were healthy but "secondary 3" was not healthy, and could not be contacted. It can also no longer be started.
      Looking at "secondary 3" the first error is following:

      2017-03-01T15:55:00.534+0000 E STORAGE  [repl writer worker 8] WiredTiger error (0) [1488383700:534159][9280:0x7f8edc3d3700], file:index-14-4040745961588520825.wt, WT_SESSION.open_cursor: read checksum error for 4096B block at offset 45056: block header checksum of 3605474371 doesn't match expected checksum of 3806882032
      2017-03-01T15:55:00.534+0000 E STORAGE  [repl writer worker 8] WiredTiger error (0) [1488383700:534218][9280:0x7f8edc3d3700], file:index-14-4040745961588520825.wt, WT_SESSION.open_cursor: index-14-4040745961588520825.wt: encountered an illegal file format or internal value
      2017-03-01T15:55:00.534+0000 E STORAGE  [repl writer worker 8] WiredTiger error (-31804) [1488383700:534226][9280:0x7f8edc3d3700], file:index-14-4040745961588520825.wt, WT_SESSION.open_cursor: the process must exit and restart: WT_PANIC: WiredTiger library panic
      2017-03-01T15:55:00.534+0000 I -        [repl writer worker 8] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
      2017-03-01T15:55:00.534+0000 I -        [repl writer worker 8]
      ***aborting after fassert() failure
      2017-03-01T15:55:00.570+0000 F -        [repl writer worker 8] Got signal: 6 (Aborted).
      

      Now immediately on restart we get logged:

      2017-03-01T16:39:22.408+0000 E STORAGE  [repl writer worker 7] WiredTiger error (0) [1488386362:408202][9783:0x7fde825da700], file:index-14-4040745961588520825.wt, WT_SESSION.open_cursor: read checksum error for 4096B block at offset 45056: block header checksum of 3605474371 doesn't match expected checksum of 3806882032
      2017-03-01T16:39:22.408+0000 E STORAGE  [repl writer worker 7] WiredTiger error (0) [1488386362:408228][9783:0x7fde825da700], file:index-14-4040745961588520825.wt, WT_SESSION.open_cursor: index-14-4040745961588520825.wt: encountered an illegal file format or internal value
      2017-03-01T16:39:22.408+0000 E STORAGE  [repl writer worker 7] WiredTiger error (-31804) [1488386362:408245][9783:0x7fde825da700], file:index-14-4040745961588520825.wt, WT_SESSION.open_cursor: the process must exit and restart: WT_PANIC: WiredTiger library panic
      2017-03-01T16:39:22.408+0000 I -        [repl writer worker 7] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
      2017-03-01T16:39:22.408+0000 I -        [repl writer worker 7]
      ***aborting after fassert() failure
      2017-03-01T16:39:22.408+0000 I -        [repl writer worker 15] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 64
      2017-03-01T16:39:22.408+0000 I -        [repl writer worker 15]
      
      ***aborting after fassert() failure
      

        1. messages
          102 kB
        2. metrics.2017-03-01T13-45-17Z-00000
          439 kB
        3. metrics.2017-03-01T16-39-13Z-00000
          35 kB
        4. metrics.2017-03-01T16-47-40Z-00000
          11 kB
        5. metrics.2017-03-01T17-10-06Z-00000
          11 kB
        6. mongod.log
          140 kB

            Assignee:
            kelsey.schubert@mongodb.com Kelsey Schubert
            Reporter:
            smachnowski@thinkmap.com Stephen Machnowski
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: