[SERVER-15912] Unable to restore dumped collection Created: 02/Nov/14  Updated: 12/Nov/14  Resolved: 12/Nov/14

Status: Closed
Project: Core Server
Component/s: Tools
Affects Version/s: 2.6.5
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Pavlo Grinchenko Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

1) Dump collection
2) Try to mongorestore it - see stack trace

Participants:

 Description   

We have the following problem:

  • Mongo cluster consists of 6 shards deployed to AWS EC2
  • One of the collections started to behave very slow for the aggregation queries. We added appropriate indexes, but it continued to be messed up
  • We wanted to dump it and restore in the local environment to test our approach
  • Dump worked, but restore fails - so we are stuck

Error stack trace looks like this:

2014-11-01T19:14:42.037-0700 Progress: 2821305158/3003237478 93% (bytes)
2014-11-01T19:14:45.002-0700 Progress: 2850985158/3003237478 94% (bytes)
2014-11-01T19:14:48.000-0700 Progress: 2881247558/3003237478 95% (bytes)
2014-11-01T19:14:51.006-0700 Progress: 2910681158/3003237478 96% (bytes)
2014-11-01T19:14:54.032-0700 Progress: 2939801158/3003237478 97% (bytes)
2014-11-01T19:14:57.058-0700 Progress: 2967957958/3003237478 98% (bytes)
2014-11-01T19:15:00.002-0700 Progress: 2996383558/3003237478 99% (bytes)
2014-11-01T19:15:00.747-0700 Assertion failure amt == (size_t)( size - 4 ) src/mongo/tools/tool.cpp 330
2014-11-01T19:15:00.765-0700 0x1006b822b 0x100670e12 0x100661a92 0x100657d10 0x10000a187 0x10000ccf7 0x10000c7a1 0x10000c7a1 0x10000c7a1 0x10000f7fc 0x100657621 0x1006596cd 0x1006599d2 0x100000df4
0 mongorestore 0x00000001006b822b _ZN5mongo15printStackTraceERSo + 43
1 mongorestore 0x0000000100670e12 _ZN5mongo10logContextEPKc + 114
2 mongorestore 0x0000000100661a92 _ZN5mongo12verifyFailedEPKcS1_j + 274
3 mongorestore 0x0000000100657d10 _ZN5mongo8BSONTool11processFileERKN5boost11filesystem34pathE + 1450
4 mongorestore 0x000000010000a187 _ZN7Restore22processFileAndMetadataERKN5boost11filesystem34pathERKSs + 4191
5 mongorestore 0x000000010000ccf7 _ZN7Restore9drillDownEN5boost11filesystem34pathEbbbb + 3109
6 mongorestore 0x000000010000c7a1 _ZN7Restore9drillDownEN5boost11filesystem34pathEbbbb + 1743
7 mongorestore 0x000000010000c7a1 _ZN7Restore9drillDownEN5boost11filesystem34pathEbbbb + 1743
8 mongorestore 0x000000010000c7a1 _ZN7Restore9drillDownEN5boost11filesystem34pathEbbbb + 1743
9 mongorestore 0x000000010000f7fc _ZN7Restore5doRunEv + 10140
10 mongorestore 0x0000000100657621 _ZN5mongo8BSONTool3runEv + 165
11 mongorestore 0x00000001006596cd ZN5mongo4Tool4mainEiPPcS2 + 1437
12 mongorestore 0x00000001006599d2 main + 66
13 mongorestore 0x0000000100000df4 start + 52
assertion: 0 assertion src/mongo/tools/tool.cpp:330

We can provide actual dump file upon request - it's 865MB



 Comments   
Comment by Ramon Fernandez Marina [ 12/Nov/14 ]

Hi paulgpa, glad to hear you found the root cause of the issue. In its defense, mongorestore does flag the corruption and aborts, although I would agree it doesn't do this in a very friendly manner.

Thanks for keeping us posted, closing this ticket now.

Comment by Pavlo Grinchenko [ 12/Nov/14 ]

OK - I think we found the source of the issue.

On a Mac by default TGZ files are unpacked with Unarchiver tool. On the dump archive it failed and extracted file partially. Unfortunately your mongorestore tool doesn't message corruption properly. We used command line unarchive approach and it was restored properly.

Sorry for confusion, but the issue is on my side, but not yours. Please close this one.

Comment by Pavlo Grinchenko [ 12/Nov/14 ]

I tried to restore this data dump using 2.6.5 tools and got the same issue:

2014-11-11T23:05:24.489-0800 Assertion failure amt == (size_t)( size - 4 ) src/mongo/tools/tool.cpp 330
2014-11-11T23:05:24.509-0800 0x1006ba3cb 0x100672fb2 0x100663c32 0x100659eb0 0x10000a5e7 0x10000d157 0x10000cc01 0x10000cc01 0x10000fc5c 0x1006597c1 0x10065b86d 0x10065bb72 0x100001254
0 mongorestore 0x00000001006ba3cb _ZN5mongo15printStackTraceERSo + 43
1 mongorestore 0x0000000100672fb2 _ZN5mongo10logContextEPKc + 114
2 mongorestore 0x0000000100663c32 _ZN5mongo12verifyFailedEPKcS1_j + 274
3 mongorestore 0x0000000100659eb0 _ZN5mongo8BSONTool11processFileERKN5boost11filesystem34pathE + 1450
4 mongorestore 0x000000010000a5e7 _ZN7Restore22processFileAndMetadataERKN5boost11filesystem34pathERKSs + 4191
5 mongorestore 0x000000010000d157 _ZN7Restore9drillDownEN5boost11filesystem34pathEbbbb + 3109
6 mongorestore 0x000000010000cc01 _ZN7Restore9drillDownEN5boost11filesystem34pathEbbbb + 1743
7 mongorestore 0x000000010000cc01 _ZN7Restore9drillDownEN5boost11filesystem34pathEbbbb + 1743
8 mongorestore 0x000000010000fc5c _ZN7Restore5doRunEv + 10140
9 mongorestore 0x00000001006597c1 _ZN5mongo8BSONTool3runEv + 165
10 mongorestore 0x000000010065b86d ZN5mongo4Tool4mainEiPPcS2 + 1437
11 mongorestore 0x000000010065bb72 main + 66
12 mongorestore 0x0000000100001254 start + 52
assertion: 0 assertion src/mongo/tools/tool.cpp:330

Comment by Ramon Fernandez Marina [ 06/Nov/14 ]

Thanks paulgpa, we'll let you know what we find.

Comment by Pavlo Grinchenko [ 06/Nov/14 ]

Ramon - we uploaded data dump.

Comment by Pavlo Grinchenko [ 04/Nov/14 ]

Thanks for the follow-up Ramon.

1) I will upload the dump file as soon as I can. I will work with my operations team to do this. Unfortunately I am on the road this week and doubt that hotel's connection will allow me to do this efficiently.

2) Our configuration: 6 shards - each represented by the replica set with 3 nodes. Fairly standard configuration.

Comment by Ramon Fernandez Marina [ 04/Nov/14 ]

Thanks for the additional information paulgpa. Can you please upload your dump file so we can examine it? Here's how:

scp -P 722 -r <dumpfile> SERVER-15912@www.mongodb.com:

where <dumpfile> is the dump file produced by mongodump. When prompted for a password just hit enter.

Also you say you have six shards, are these stand-alone or replica sets? If the latter, are you taking the dump from the primary? Can you please include the command line(s) you used to produce the dumpfile?

Thanks for offering to try the new tools, that should provide useful information on our effort to troubleshoot this ticket.

Comment by Pavlo Grinchenko [ 04/Nov/14 ]

Ramon

Thanks a lot for your suggestion. This is a completely repeatable problem. We reproduced it 2 times with different operations team members. We will try your 2.7.7/2.7.8 tools proposal. Do you think it will work properly with a sharded environment that's on 2.6.5?

Comment by Ramon Fernandez Marina [ 03/Nov/14 ]

paulgpa, note that the MongoDB tools have been entirely re-written for the 2.7.7 development release, so if a new dump+restore cycle shows the same problem I'd recommend you try with the new tools, which can be downloaded as part of the 2.7.7/2.7.8 development releases or cloned from github.

If the new tools also show the issue please let us know, and I'll send you instructions to upload the dump file.

Comment by Ramon Fernandez Marina [ 03/Nov/14 ]

paulgpa, is this a repeatable problem? Can you re-dump the collection? The error you're seeing could indicate data corruption on the dump file, so I'd recommend you try to re-dump the collection to a new dump file while checking the OS logs for possible disk-related errors. If no disk errors appear, the dump succeeds but the restore fails again, there may be a bug in mongodump/restore that would need investigation.

Comment by Pavlo Grinchenko [ 03/Nov/14 ]

> mongorestore --version
version 2.6.4

I will upgrade to the latest one and try again

Generated at Thu Feb 08 03:39:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.