[SERVER-23442] ReplicaSet Sync Failure on incorrect disk issue in WiredTiger Created: 31/Mar/16  Updated: 09/Jun/16  Resolved: 26/Apr/16

Status: Closed
Project: Core Server
Component/s: Replication, WiredTiger
Affects Version/s: 3.2.4
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Chad Kreimendahl Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: WTplaybook
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

3 Existing replicaset members in a single data center.
1 New replicaset member in another data center

rs.add({
	"_id" : 3,
	"host" : "NewServerName:27017",
	"arbiterOnly" : false,
	"buildIndexes" : true,
	"hidden" : true,
	"priority" : 0,
	"tags" : {
 
	},
	"slaveDelay" : NumberLong(0),
	"votes" : 0
	});

Wait a few hours

//

Participants:

 Description   

When adding a new replicaset member to the set, while syncing across a relatively slow network connection (< 200mbps), we're seeing the replication fail with a WiredTiger error "No space left on device".

However, there is substantial space left on EVERY disk on the system, including the one specifically mounted for Mongo.

/dev/sda1 880G 35G 801G 5% /
/dev/sdb2 187G 2.3G 175G 2% /mongodb

— a bunch of index creation messages, of which there are tens of thousands in the hours prior, followed by:

2016-03-31T00:19:05.902-0500 I STORAGE  [rsSync] copying indexes for: { name: "IUS", options: {} }
2016-03-31T00:19:05.907-0500 I STORAGE  [rsSync] copying indexes for: { name: "A2V", options: {} }
2016-03-31T00:19:06.414-0500 E STORAGE  [rsSync] WiredTiger (28) [1459401546:414114][23631:0x7fdbb377f700], WT_SESSION.create: /mongodb/wt/SomeCustomer/index/47443--3141513892672868567.wt: No space left on device
2016-03-31T00:19:06.426-0500 E REPL     [rsSync] 8 28: No space left on device
2016-03-31T00:19:06.426-0500 E REPL     [rsSync] initial sync attempt failed, 9 attempts remaining
2016-03-31T00:19:06.608-0500 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:33991 #61 (4 connections now open)
2016-03-31T00:19:06.614-0500 I NETWORK  [conn61] end connection 127.0.0.1:33991 (3 connections now open)
2016-03-31T00:19:08.011-0500 W FTDC     [ftdc] Uncaught exception in 'FileNotOpen: Failed to open interim file /mongodb/wt/diagnostic.data/metrics.interim.temp' in full-time diagnostic data capture subsystem. Shutting down the full-time diagnostic data capture subsystem.
2016-03-31T00:19:11.426-0500 I REPL     [rsSync] initial sync pending
2016-03-31T00:19:11.429-0500 I REPL     [ReplicationExecutor] syncing from: SomeServer3:27017
2016-03-31T00:19:11.447-0500 I REPL     [rsSync] initial sync drop all databases
2016-03-31T00:19:11.447-0500 I STORAGE  [rsSync] dropAllDatabasesExceptLocal 73
2016-03-31T00:19:37.769-0500 I REPL     [rsSync] initial sync clone all databases



 Comments   
Comment by Ramon Fernandez Marina [ 26/Apr/16 ]

Thanks for the update sallgeud, we're closing this ticket then.

Regards,
Ramón.

Comment by Chad Kreimendahl [ 26/Apr/16 ]

It appears the issue was one of inodes. The old system that was upgraded was built with the assumption of few large files (because of how mmapv1 worked)... and was not reformatted with more inodes when we upgraded. We've since updated it, including moving to xfs, and ensuring there would be more than enough inodes based on what we see in our brand-new systems.

Comment by Ramon Fernandez Marina [ 25/Apr/16 ]

sallgeud, if I'm not mistaken the maximum number of directory entries in ext4 is 64000, so a system where the number of collections plus indexes approaches that value may behave in the manner you describe.

If this is the root cause of the behavior you're seeing you can consider using XFS (which we already recommend in our production notes) or switch to directoryPerDB.

Can you please count the number of collections and indexes on this deployment to see if they could be the source of the problem you're seeing?

Thanks,
Ramón.

Comment by Chad Kreimendahl [ 31/Mar/16 ]

1: Probably not ulimit stuff as we are a bit overkill on these numbers (for good reason):

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127698
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 524288
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 127698
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

2: potential inode limit? Might be interesting to add a note to this in your "production notes" to check that when upgrading from significantly less files (mmapv1) to significantly more smaller files (wt), you might need more inodes.

Filesystem                  Inodes IUsed IFree IUse% Mounted on
/dev/sda1                      56M   85K   56M    1% /
/dev/sdb2                      47K   21K   27K   44% /mongodb

3. ext4 on the specific system with the errors.. other systems use xfs

I can run it again and check inodes in the moments before the failure, as there were probably a truckload created as indexes were being copied.

Comment by Kelsey Schubert [ 31/Mar/16 ]

Hi sallgeud,

So we can continue to investigate this issue, please answer the following questions about the affected system:

  1. Can you provide the output of ulimit -a as the user that runs MongoDB?
  2. Can you post the output of df -ih?
  3. Which file system are you using?

Thank you,
Thomas

Comment by Chad Kreimendahl [ 31/Mar/16 ]

To save some time. This always happens on the same step, when cloning indexes. All db's had already been successfully cloned, and the failure would happen on the same indexing phase of one specific database.

I won't be able to share the database itself, for contractual / privacy reasons, but I can share metadata about it, such as size, collection counts, index counts, etc:

"collections" : 109,
"objects" : 583722,
"avgObjSize" : 6059.7657720627285,
"dataSize" : 3537218596,
"storageSize" : 1038409728,
"numExtents" : 0,
"indexes" : 1076,
"indexSize" : 133951488,
"ok" : 1

Of note is that the above stats are actually on the higher end of average, but much smaller than some databases + indexes that had already previously completed.

Generated at Thu Feb 08 04:03:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.