Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-23442

ReplicaSet Sync Failure on incorrect disk issue in WiredTiger

    • ALL
    • Hide

      3 Existing replicaset members in a single data center.
      1 New replicaset member in another data center

      rs.add({
      	"_id" : 3,
      	"host" : "NewServerName:27017",
      	"arbiterOnly" : false,
      	"buildIndexes" : true,
      	"hidden" : true,
      	"priority" : 0,
      	"tags" : {
      
      	},
      	"slaveDelay" : NumberLong(0),
      	"votes" : 0
      	});
      

      Wait a few hours

      //

      Show
      3 Existing replicaset members in a single data center. 1 New replicaset member in another data center rs.add({ "_id" : 3, "host" : "NewServerName:27017", "arbiterOnly" : false, "buildIndexes" : true, "hidden" : true, "priority" : 0, "tags" : { }, "slaveDelay" : NumberLong(0), "votes" : 0 }); Wait a few hours //

      When adding a new replicaset member to the set, while syncing across a relatively slow network connection (< 200mbps), we're seeing the replication fail with a WiredTiger error "No space left on device".

      However, there is substantial space left on EVERY disk on the system, including the one specifically mounted for Mongo.

      /dev/sda1 880G 35G 801G 5% /
      /dev/sdb2 187G 2.3G 175G 2% /mongodb

      — a bunch of index creation messages, of which there are tens of thousands in the hours prior, followed by:

      2016-03-31T00:19:05.902-0500 I STORAGE  [rsSync] copying indexes for: { name: "IUS", options: {} }
      2016-03-31T00:19:05.907-0500 I STORAGE  [rsSync] copying indexes for: { name: "A2V", options: {} }
      2016-03-31T00:19:06.414-0500 E STORAGE  [rsSync] WiredTiger (28) [1459401546:414114][23631:0x7fdbb377f700], WT_SESSION.create: /mongodb/wt/SomeCustomer/index/47443--3141513892672868567.wt: No space left on device
      2016-03-31T00:19:06.426-0500 E REPL     [rsSync] 8 28: No space left on device
      2016-03-31T00:19:06.426-0500 E REPL     [rsSync] initial sync attempt failed, 9 attempts remaining
      2016-03-31T00:19:06.608-0500 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:33991 #61 (4 connections now open)
      2016-03-31T00:19:06.614-0500 I NETWORK  [conn61] end connection 127.0.0.1:33991 (3 connections now open)
      2016-03-31T00:19:08.011-0500 W FTDC     [ftdc] Uncaught exception in 'FileNotOpen: Failed to open interim file /mongodb/wt/diagnostic.data/metrics.interim.temp' in full-time diagnostic data capture subsystem. Shutting down the full-time diagnostic data capture subsystem.
      2016-03-31T00:19:11.426-0500 I REPL     [rsSync] initial sync pending
      2016-03-31T00:19:11.429-0500 I REPL     [ReplicationExecutor] syncing from: SomeServer3:27017
      2016-03-31T00:19:11.447-0500 I REPL     [rsSync] initial sync drop all databases
      2016-03-31T00:19:11.447-0500 I STORAGE  [rsSync] dropAllDatabasesExceptLocal 73
      2016-03-31T00:19:37.769-0500 I REPL     [rsSync] initial sync clone all databases
      

            Assignee:
            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            Reporter:
            sallgeud Chad Kreimendahl
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: