[SERVER-3430] Exception while running a command-line repair Created: 15/Jul/11  Updated: 30/Mar/12  Resolved: 12/Sep/11

Status: Closed
Project: Core Server
Component/s: Tools
Affects Version/s: 1.8.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andrew Huling Assignee: Aaron Staple
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Amazon Linux


Attachments: Text File mongo01.staging_repair.log    
Issue Links:
Related
is related to SERVER-3759 filesystem ops may cause termination ... Closed
Operating System: Linux
Participants:

 Description   

I ran a command-line repair on one server in 2 server+arbiter replica set to try to reclaim some disk space and got the following error:

[initandlisten] Assertion: 10446:mmap: can't map area of size 0 file: /data/db/_tmp/esort.1310694712.469162920//file.9
0x55f5aa 0x5ba9f8 0x7a5eb5 0x7a994e 0x776b40 0x77874d 0x77a0ae 0x77b4cf 0x726458 0x727dcd 0x728e0d 0x78349a 0x8aab98 0x8abad4 0x8acecc 0x8ada88 0x8b36bf 0x7f2daff0ac1d 0x4e10c9
mongod(_ZN5mongo11msgassertedEiPKc+0x12a) [0x55f5aa]
mongod(_ZN5mongo16MemoryMappedFile3mapEPKcRyi+0x438) [0x5ba9f8]
mongod(_ZN5mongo21BSONObjExternalSorter12FileIteratorC1ESs+0x45) [0x7a5eb5]
mongod(ZN5mongo21BSONObjExternalSorter8IteratorC1EPS0+0x13e) [0x7a994e]
mongod(_ZN5mongo14fastBuildIndexEPKcPNS_16NamespaceDetailsERNS_12IndexDetailsEi+0xc70) [0x776b40]
mongod() [0x77874d]
mongod(_ZN5mongo11DataFileMgr6insertEPKcPKvibRKNS_11BSONElementEb+0xf4e) [0x77a0ae]
mongod(_ZN5mongo11DataFileMgr16insertWithObjModEPKcRNS_7BSONObjEb+0x5f) [0x77b4cf]
mongod(_ZN5mongo6Cloner4copyEPKcS2_bbbbNS_5QueryE+0x3a8) [0x726458]
mongod(_ZN5mongo6Cloner2goEPKcRSsRKSsbbbb+0x10cd) [0x727dcd]
mongod(_ZN5mongo9cloneFromEPKcRSsRKSsbbbb+0x3d) [0x728e0d]
mongod(_ZN5mongo14repairDatabaseESsRSsbb+0x46a) [0x78349a]
mongod(_ZN5mongo11doDBUpgradeERKSsSsPNS_14DataFileHeaderE+0x68) [0x8aab98]
mongod() [0x8abad4]
mongod(_ZN5mongo14_initAndListenEiPKc+0x41c) [0x8acecc]
mongod(_ZN5mongo13initAndListenEiPKc+0x18) [0x8ada88]
mongod(main+0x5acf) [0x8b36bf]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x7f2daff0ac1d]
mongod(__gxx_personality_v0+0x3a1) [0x4e10c9]
Fri Jul 15 01:54:23 [initandlisten] exception in initAndListen std::exception: mmap: can't map area of size 0 file: /data/db/_tmp/esort.1310694712.469162920//file.9, terminating

The repair was successful on the other server.



 Comments   
Comment by Aaron Staple [ 12/Sep/11 ]

I think it might be SERVER-3759.

Comment by Aaron Staple [ 05/Sep/11 ]

Hi Andrew - is it possible /data/db ran out of space during the repair? Mongod would have put some temporary files in there.

Comment by Andrew Huling [ 18/Jul/11 ]

Here it is. Thank you for your help!

Comment by Eliot Horowitz (Inactive) [ 18/Jul/11 ]

Can you attach the full log file?

Comment by Andrew Huling [ 18/Jul/11 ]

That is correct. It looks to me like it is looking for a file in that directory, not finding it, and then as part of the process trying to use that non-existent (size 0) file somewhere as a result. But I really have no idea what I'm talking about so I could be way off.

Comment by Eliot Horowitz (Inactive) [ 18/Jul/11 ]

And nothing in /data/db/_tmp/?

Comment by Andrew Huling [ 18/Jul/11 ]

Here is the ls -la of those directories when it errs out:

[ahuling@mongo01.i.staging.gamechanger.io $tmp_repairDatabase_0]$ ls -la
total 18835472
drwxr-xr-x 2 root root 4096 Jul 14 23:48 .
drwxr-xr-x 7 root root 4096 Jul 18 13:29 ..
rw------ 1 root root 67108864 Jul 14 23:54 queue.0
rw------ 1 root root 134217728 Jul 14 23:38 queue.1
rw------ 1 root root 2146435072 Jul 14 23:51 queue.10
rw------ 1 root root 2146435072 Jul 14 23:54 queue.11
rw------ 1 root root 2146435072 Jul 14 23:49 queue.12
rw------ 1 root root 268435456 Jul 14 23:38 queue.2
rw------ 1 root root 536870912 Jul 14 23:39 queue.3
rw------ 1 root root 1073741824 Jul 14 23:40 queue.4
rw------ 1 root root 2146435072 Jul 14 23:42 queue.5
rw------ 1 root root 2146435072 Jul 14 23:43 queue.6
rw------ 1 root root 2146435072 Jul 14 23:44 queue.7
rw------ 1 root root 2146435072 Jul 14 23:46 queue.8
rw------ 1 root root 2146435072 Jul 14 23:47 queue.9
rw------ 1 root root 16777216 Jul 14 23:54 queue.ns
[ahuling@mongo01.i.staging.gamechanger.io $tmp_repairDatabase_0]$ cd ../\$tmp_repairDatabase_1
[ahuling@mongo01.i.staging.gamechanger.io $tmp_repairDatabase_1]$ ls -la
total 18835472
drwxr-xr-x 2 root root 4096 Jul 15 00:20 .
drwxr-xr-x 7 root root 4096 Jul 18 13:29 ..
rw------ 1 root root 67108864 Jul 15 00:27 queue.0
rw------ 1 root root 134217728 Jul 15 00:06 queue.1
rw------ 1 root root 2146435072 Jul 15 00:24 queue.10
rw------ 1 root root 2146435072 Jul 15 00:27 queue.11
rw------ 1 root root 2146435072 Jul 15 00:21 queue.12
rw------ 1 root root 268435456 Jul 15 00:06 queue.2
rw------ 1 root root 536870912 Jul 15 00:07 queue.3
rw------ 1 root root 1073741824 Jul 15 00:08 queue.4
rw------ 1 root root 2146435072 Jul 15 00:11 queue.5
rw------ 1 root root 2146435072 Jul 15 00:13 queue.6
rw------ 1 root root 2146435072 Jul 15 00:15 queue.7
rw------ 1 root root 2146435072 Jul 15 00:17 queue.8
rw------ 1 root root 2146435072 Jul 15 00:19 queue.9
rw------ 1 root root 16777216 Jul 15 00:27 queue.ns
[ahuling@mongo01.i.staging.gamechanger.io $tmp_repairDatabase_1]$ cd ../\$tmp_repairDatabase_2
[ahuling@mongo01.i.staging.gamechanger.io $tmp_repairDatabase_2]$ ls -la
total 18835472
drwxr-xr-x 2 root root 4096 Jul 15 00:58 .
drwxr-xr-x 7 root root 4096 Jul 18 13:29 ..
rw------ 1 root root 67108864 Jul 15 01:04 queue.0
rw------ 1 root root 134217728 Jul 15 00:49 queue.1
rw------ 1 root root 2146435072 Jul 15 01:01 queue.10
rw------ 1 root root 2146435072 Jul 15 01:04 queue.11
rw------ 1 root root 2146435072 Jul 15 00:59 queue.12
rw------ 1 root root 268435456 Jul 15 00:49 queue.2
rw------ 1 root root 536870912 Jul 15 00:49 queue.3
rw------ 1 root root 1073741824 Jul 15 00:50 queue.4
rw------ 1 root root 2146435072 Jul 15 00:51 queue.5
rw------ 1 root root 2146435072 Jul 15 00:53 queue.6
rw------ 1 root root 2146435072 Jul 15 00:54 queue.7
rw------ 1 root root 2146435072 Jul 15 00:56 queue.8
rw------ 1 root root 2146435072 Jul 15 00:57 queue.9
rw------ 1 root root 16777216 Jul 15 01:04 queue.ns
[ahuling@mongo01.i.staging.gamechanger.io $tmp_repairDatabase_2]$ cd ../\$tmp_repairDatabase_3
[ahuling@mongo01.i.staging.gamechanger.io $tmp_repairDatabase_3]$ ls -la
total 18835472
drwxr-xr-x 2 root root 4096 Jul 15 01:49 .
drwxr-xr-x 7 root root 4096 Jul 18 13:29 ..
rw------ 1 root root 67108864 Jul 15 01:54 queue.0
rw------ 1 root root 134217728 Jul 15 01:39 queue.1
rw------ 1 root root 2146435072 Jul 15 01:51 queue.10
rw------ 1 root root 2146435072 Jul 15 01:54 queue.11
rw------ 1 root root 2146435072 Jul 15 01:49 queue.12
rw------ 1 root root 268435456 Jul 15 01:39 queue.2
rw------ 1 root root 536870912 Jul 15 01:39 queue.3
rw------ 1 root root 1073741824 Jul 15 01:40 queue.4
rw------ 1 root root 2146435072 Jul 15 01:42 queue.5
rw------ 1 root root 2146435072 Jul 15 01:43 queue.6
rw------ 1 root root 2146435072 Jul 15 01:44 queue.7
rw------ 1 root root 2146435072 Jul 15 01:46 queue.8
rw------ 1 root root 2146435072 Jul 15 01:47 queue.9
rw------ 1 root root 16777216 Jul 15 01:54 queue.ns
[ahuling@mongo01.i.staging.gamechanger.io $tmp_repairDatabase_3]$ cd ../\$tmp_repairDatabase_4
[ahuling@mongo01.i.staging.gamechanger.io $tmp_repairDatabase_4]$ ls -la
total 18835472
drwxr-xr-x 2 root root 4096 Jul 18 13:39 .
drwxr-xr-x 7 root root 4096 Jul 18 13:29 ..
rw------ 1 root root 67108864 Jul 18 13:44 queue.0
rw------ 1 root root 134217728 Jul 18 13:29 queue.1
rw------ 1 root root 2146435072 Jul 18 13:41 queue.10
rw------ 1 root root 2146435072 Jul 18 13:44 queue.11
rw------ 1 root root 2146435072 Jul 18 13:39 queue.12
rw------ 1 root root 268435456 Jul 18 13:29 queue.2
rw------ 1 root root 536870912 Jul 18 13:29 queue.3
rw------ 1 root root 1073741824 Jul 18 13:30 queue.4
rw------ 1 root root 2146435072 Jul 18 13:32 queue.5
rw------ 1 root root 2146435072 Jul 18 13:33 queue.6
rw------ 1 root root 2146435072 Jul 18 13:35 queue.7
rw------ 1 root root 2146435072 Jul 18 13:36 queue.8
rw------ 1 root root 2146435072 Jul 18 13:37 queue.9
rw------ 1 root root 16777216 Jul 18 13:44 queue.ns

Comment by Eliot Horowitz (Inactive) [ 18/Jul/11 ]

Ah yes.
Can you send the full contents of all those directories with size information when/if it errors out.

Comment by Andrew Huling [ 18/Jul/11 ]

I'm running it now and there appears to be nothing in that directory still. However, I realized that you might not be aware that I am running the repair with the repairpath option specified. In that directory, there are 5 sub-directories $tmp_repairDatabase_0 - $tmp_repairDatabase_4. In those directories, there are a bunch of non-zero sized files named for the database being backed up. Let me know if you'd like more detail about that directory.

Comment by Eliot Horowitz (Inactive) [ 17/Jul/11 ]

Can you try the repair again and send ls -la of the tmp?

Comment by Andrew Huling [ 17/Jul/11 ]

It is still failing.

Comment by Eliot Horowitz (Inactive) [ 17/Jul/11 ]

Is it still failing or did it eventually work?

Comment by Andrew Huling [ 15/Jul/11 ]

The directory no longer exists. I did run that command at least once after the repair process had failed (I ran the process multiple times to see if the problem continued to happen) when the _tmp directory was there, but it was empty.

Comment by Eliot Horowitz (Inactive) [ 15/Jul/11 ]

Can you do a ls -la on /data/db/_tmp/ if it still exists

Comment by Andrew Huling [ 15/Jul/11 ]

I don't think there was ever an unclean shutdown and I am running with journaling.

Comment by Eliot Horowitz (Inactive) [ 15/Jul/11 ]

Was there ever an unclean shutdown?
Are you running with journalling?

Generated at Thu Feb 08 03:03:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.