mongodump times-out intermittently in the sharded environment (ec2)

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Won't Fix
    • Priority: Major - P3
    • None
    • Affects Version/s: 2.4.5
    • Component/s: Networking, Sharding
    • None
    • Environment:
      Ubuntu, EC2
    • ALL
    • Hide

      Nothing special. Just simple mongodump.

      Show
      Nothing special. Just simple mongodump.
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      We have consistent problems with doing backups of our Reporting database.

      • OS: Ubuntu, EC2
      • Simple sharded environment with just 1 shard
      • Multiple databases - it consistently breaks with >50% probability on largest one around 5GB (still very small)

      Output looks like this:

      Dumping whole database...
      Tue Jul 30 12:40:01.447 creating new connection to:localhost:27017
      Tue Jul 30 12:40:01.448 BackgroundJob starting: ConnectBG
      Tue Jul 30 12:40:01.448 connected connection!
      connected to: localhost
      Tue Jul 30 12:40:01.448 DATABASE: reportsRaw to /mnt/backup/ReportsRawDB/dump/reportsRaw
      Tue Jul 30 12:40:01.453 reportsRaw.system.indexes to /mnt/backup/ReportsRawDB/dump/reportsRaw/system.indexes.bson
      Tue Jul 30 12:40:01.457 4 objects
      Tue Jul 30 12:40:01.457 skipping collection: reportsRaw.video.plays.$id
      Tue Jul 30 12:40:01.457 skipping collection: reportsRaw.ads.errors.$id
      Tue Jul 30 12:40:01.457 skipping collection: reportsRaw.ads.starts.$id
      Tue Jul 30 12:40:01.457 skipping collection: reportsRaw.bid.errors.raw.$id
      Tue Jul 30 12:40:01.457 reportsRaw.video.plays to /mnt/backup/ReportsRawDB/dump/reportsRaw/video.plays.bson
      Tue Jul 30 12:40:04.054 Collection File Writing Progress: 850900/10958683 7% (objects)
      Tue Jul 30 12:40:07.039 Collection File Writing Progress: 1955900/10958683 17% (objects)
      Tue Jul 30 12:40:10.006 Collection File Writing Progress: 3060700/10958683 27% (objects)
      Tue Jul 30 12:40:13.007 Collection File Writing Progress: 4228100/10958683 38% (objects)
      Tue Jul 30 12:40:16.035 Collection File Writing Progress: 5376800/10958683 49% (objects)
      Tue Jul 30 12:40:19.042 Collection File Writing Progress: 6429700/10958683 58% (objects)
      Tue Jul 30 12:40:22.042 Collection File Writing Progress: 7490800/10958683 68% (objects)
      Tue Jul 30 12:40:25.001 Collection File Writing Progress: 8534300/10958683 77% (objects)
      Tue Jul 30 12:40:28.053 Collection File Writing Progress: 9533700/10958683 86% (objects)
      Tue Jul 30 12:40:31.006 Collection File Writing Progress: 10577200/10958683 96% (objects)
      Tue Jul 30 12:40:32.073 10959317 objects
      Tue Jul 30 12:40:32.073 Metadata for reportsRaw.video.plays to /mnt/backup/ReportsRawDB/dump/reportsRaw/video.plays.metadata.json
      Tue Jul 30 12:40:32.073 reportsRaw.ads.errors to /mnt/backup/ReportsRawDB/dump/reportsRaw/ads.errors.bson
      Tue Jul 30 12:40:32.993 200000 objects
      Tue Jul 30 12:40:32.993 Metadata for reportsRaw.ads.errors to /mnt/backup/ReportsRawDB/dump/reportsRaw/ads.errors.metadata.json
      Tue Jul 30 12:40:32.994 reportsRaw.ads.starts to /mnt/backup/ReportsRawDB/dump/reportsRaw/ads.starts.bson
      Tue Jul 30 12:40:35.008 Collection File Writing Progress: 475100/12558979 3% (objects)
      Tue Jul 30 12:40:38.040 Collection File Writing Progress: 1275300/12558979 10% (objects)
      Tue Jul 30 12:40:41.001 Collection File Writing Progress: 2108000/12558979 16% (objects)
      Tue Jul 30 12:40:44.008 Collection File Writing Progress: 2931900/12558979 23% (objects)
      Tue Jul 30 12:40:47.036 Collection File Writing Progress: 3749000/12558979 29% (objects)
      Tue Jul 30 13:00:48.073 Socket recv() errno:104 Connection reset by peer 127.0.0.1:27017
      Tue Jul 30 13:00:48.074 SocketException: remote: 127.0.0.1:27017 error: 9001 socket exception [1] server [127.0.0.1:27017]
      Tue Jul 30 13:00:48.074 User Assertion: 10278:dbclient error communicating with server: localhost:27017
      assertion: 10278 dbclient error communicating with server: localhost:27017
      Can't create dump of reportsRaw

      We had to move to 6 hour backup schedule from 24 hour to give 4 attempts for mongodump to succeed.

      We upgraded everything to 2.4.5 (latest) version

        1. db.currentOps.log.gz
          5 kB
        2. mongodump_verbose.log.gz
          2 kB
        3. mongosniff_console_output.log.gz
          5 kB
        4. mongosniff.log.gz
          11 kB
        5. SERVER-10377.tar.gz
          42 kB
        6. SH01.log.gz
          201 kB

            Assignee:
            David Hows (Inactive)
            Reporter:
            Pavlo Grinchenko
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: