Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18583

Errors validating newly created WiredTiger replica member during upgrade to 3.0

    • Type: Icon: Bug Bug
    • Resolution: Incomplete
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.0.2, 3.0.3
    • Component/s: Replication, Storage, WiredTiger
    • None
    • ALL
    • Hide

      We have tried to re-create this replica on a few different servers, the issue happened 3 times out of 4. Though on the replica that wasn't corrupted initially, we have noticed the issue a few days later. So that makes it 4 out of 4.

      I'm not sure if it is reproducible on other clusters since we didn't have a chance to try it there.

      Show
      We have tried to re-create this replica on a few different servers, the issue happened 3 times out of 4. Though on the replica that wasn't corrupted initially, we have noticed the issue a few days later. So that makes it 4 out of 4. I'm not sure if it is reproducible on other clusters since we didn't have a chance to try it there.

      We are trying to migrate from Mongo 2.6 to the latest 3.0 and as a part of the process we have decided to switch one of our backup replicas (that we use for lvm-snapshot-based backups) to 3.0 and see how it behaves.

      The initial sync of the new replica succeeds, but when we try to validate the data on it, we hit a really weird issue: the data seems to be corrupted. Our validation script runs validate on all collections on the server and the first one it hits is local.oplog.rs. Here is the result:

      bulk:SECONDARY> use local
      switched to db local
      bulk:SECONDARY> db.oplog.rs.validate()
      {
      	"ns" : "local.oplog.rs",
      	"nIndexes" : 0,
      	"keysPerIndex" : {
      
      	},
      	"valid" : false,
      	"errors" : [
      		"[1432129022:325598][14023:0x7f00ae4d3700], file:collection-6--2540725476965076228.wt, session.verify: checkpoint ranges never verified: 668",
      		"[1432129022:521082][14023:0x7f00ae4d3700], file:collection-6--2540725476965076228.wt, session.verify: file ranges never verified: 668",
      		"verify() returned WT_ERROR: non-specific WiredTiger error. This indicates structural damage. Not examining individual documents."
      	],
      	"warning" : "Some checks omitted for speed. use {full:true} option to do more thorough scan.",
      	"advice" : "ns corrupt. See http://dochub.mongodb.org/core/data-recovery",
      	"ok" : 1
      }
      

      Configuration file for the server looks like this:

      # Where to log
      systemLog:
        destination: file
        path: /var/log/mongo/mongod-bulk.log
        logAppend: true
        logRotate: reopen
      
      #-------------------------------------------------------------------------------
      # Where to listen for connections
      net:
        # TCP port for connections
        port: 27018
      
        # Disable IPv6
        ipv6: false
      
        # Disable unix socket
        unixDomainSocket:
          enabled: false
      
      #-------------------------------------------------------------------------------
      # How to deal with the processes
      processManagement:
        # Fork and run in background
        fork: true
      
        # Location of pidfile
        pidFilePath: /var/run/mongodb/mongod-bulk.pid
      
      #-------------------------------------------------------------------------------
      # Where and how to store data
      storage:
        # Database path
        dbPath: /db/mongo/bulk
      
        # Enable journaling
        journal:
          enabled: true
      
        # Use WiredTiger storage engine by default
        engine: wiredTiger
      
        # Tuning for wired tiger
        wiredTiger:
          engineConfig:
            # Use up to 2G of RAM for caching
            cacheSizeGB: 2
      
      #-------------------------------------------------------------------------------
      # Replication Options
      replication:
        # Limit the size of oplog
        oplogSizeMB: 65536
      
        # Set replica name
        replSetName: bulk
      

        1. replica303.log.gz
          2.47 MB
          Oleksiy Kovyrin

            Assignee:
            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            Reporter:
            kovyrin Oleksiy Kovyrin
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: