[SERVER-14383] Mongodump Error : assertion: 16465 recv failed while exhausting cursor Created: 28/Jun/14  Updated: 16/Nov/21  Resolved: 03/Jul/14

Status: Closed
Project: Core Server
Component/s: Internal Code, Replication, Sharding
Affects Version/s: 2.4.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Sridhara Huchappa Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongod.log     Text File mongos.log    
Operating System: ALL
Steps To Reproduce:

trying to run mongodump on a 800 GB collection

mongodump -u <UID> -p <PWD> --authenticationDatabase admin -o /data/bak/

Participants:

 Description   

Sat Jun 28 05:26:36.777 Collection File Writing Progress: 7694300/12192615 63% (objects)
Sat Jun 28 05:26:39.253 Collection File Writing Progress: 7694400/12192615 63% (objects)
Sat Jun 28 05:26:43.900 Collection File Writing Progress: 7694700/12192615 63% (objects)
assertion: 16465 recv failed while exhausting cursor



 Comments   
Comment by Thomas Rueckstiess [ 03/Jul/14 ]

Hi Sridhara,

The log file you attached doesn't cover the the time of the initial failure of mongodump (Sat Jun 28 05:26:43.900), so I can't say for sure what the problem was. It only covers:

     source: mongod.log
      start: 2014 Jun 28 12:12:38.661
        end: 2014 Jun 30 16:50:25.713

I noticed that the server shuts down uncleanly at Sat Jun 28 22:27:45.404, see below. On restart, the node had to recover from the journal to get into consistent state again. Is that the point when the mongodump failed? I can't tell why the server was shut down, but that would explain the "recv" error you were seeing.

Sat Jun 28 22:27:38.134 [conn23] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:16641 reslen:3166 110ms
Sat Jun 28 22:27:39.188 [conn817] getmore erepprod.OENSCASEDICISION cursorid:91958050819401614 ntoreturn:0 exhaust:1 keyUpdates:0 numYields: 19 locks(micros) r:2923933 nreturned:75 reslen:4224176 1993ms
Sat Jun 28 22:27:41.498 [conn817] getmore erepprod.OENSCASEDICISION cursorid:91958050819401614 ntoreturn:0 exhaust:1 keyUpdates:0 numYields: 34 locks(micros) r:2456112 nreturned:94 reslen:4402958 1980ms
Sat Jun 28 22:27:43.709 [conn817] getmore erepprod.OENSCASEDICISION cursorid:91958050819401614 ntoreturn:0 exhaust:1 keyUpdates:0 numYields: 17 locks(micros) r:3044326 nreturned:65 reslen:4386812 2144ms
Sat Jun 28 22:27:45.404 [conn817] getmore erepprod.OENSCASEDICISION cursorid:91958050819401614 ntoreturn:0 exhaust:1 keyUpdates:0 numYields: 11 locks(micros) r:2657869 nreturned:47 reslen:4214664 1633ms
 
 
***** SERVER RESTARTED *****
 
 
Sat Jun 28 22:41:13.790 [initandlisten] MongoDB starting : pid=28998 port=27017 dbpath=/data/db/ 64-bit host=wslermdb01sp
Sat Jun 28 22:41:13.791 [initandlisten] db version v2.4.4
Sat Jun 28 22:41:13.791 [initandlisten] git version: 4ec1fb96702c9d4c57b1e06dd34eb73a16e407d2
Sat Jun 28 22:41:13.791 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Sat Jun 28 22:41:13.792 [initandlisten] allocator: tcmalloc
Sat Jun 28 22:41:13.792 [initandlisten] options: { config: "/data/config/mongod.conf", dbpath: "/data/db/", fork: "true", logappend: "true", logpath: "/data/dblog/mongod.log", pidfilepath: "/var/run/mongodb/mongod.pid", port: 27017, replSet: "erep" }
Sat Jun 28 22:41:13.863 [initandlisten] journal dir=/data/db/journal
Sat Jun 28 22:41:13.864 [initandlisten] recover begin
Sat Jun 28 22:41:13.866 [initandlisten] recover lsn: 56916712
Sat Jun 28 22:41:13.866 [initandlisten] recover /data/db/journal/j._0
Sat Jun 28 22:41:13.910 [initandlisten] recover skipping application of section seq:0 < lsn:56916712
Sat Jun 28 22:41:13.911 [initandlisten] recover skipping application of section seq:1591504 < lsn:56916712
Sat Jun 28 22:41:13.912 [initandlisten] recover skipping application of section seq:1650520 < lsn:56916712
Sat Jun 28 22:41:13.921 [initandlisten] recover skipping application of section seq:2768648 < lsn:56916712
Sat Jun 28 22:41:13.959 [initandlisten] recover skipping application of section seq:2827573 < lsn:56916712

Now that the Secondary is broken and we are trying to restore secondary using this backup.

It sounds like you just want to get your broken secondary up and running again with a copy of the primary. You cannot use mongodump for this, as it won't dump the local database. You can just start the secondary without the data and let it do an initial resync. To speed up the process, you can also copy the entire dbpath content over from the primary to the secondary to bootstrap the resync. For that to work, you either need to use a filesystem snapshot (LVM) or stop writes to the primary during the copy, as you would otherwise get an inconsistent copy.

See http://docs.mongodb.org/manual/tutorial/resync-replica-set-member/#sync-by-copying-data-files-from-another-member

As I don't see any evidence for a bug here, and the SERVER project is for reporting bugs and feature requests, I'm closing this ticket as "Works as Designed". If you need further support, you can post your question on the mongodb-users group (http://groups.google.com/group/mongodb-user).

Regards,
Thomas

Comment by Sridhara Huchappa [ 01/Jul/14 ]

Hi Thomas,

Thanks for the quick response. Sorry for replying late. Attached, please see the mongo logs. The dump size is around 800GB.
The problem is indeed reoccurring. We never had any issues doing a mongodump backups on secondary before. Now that the Secondary is broken and we are trying to restore secondary using this backup.

Regards,
Sridhara

Comment by Thomas Rueckstiess [ 30/Jun/14 ]

Hi Sridhara,

Can you please provide the full log file of the mongod you tried to dump the data from? Ideally from the time the mongodump started.

Also, is this a reoccurring problem or was it just that once?

Thanks,
Thomas

Generated at Thu Feb 08 03:34:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.