Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-5673

Crash during repairDatabase can leave the server unable to start up

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: Major - P3 Major - P3
    • None
    • 2.0.4
    • Admin, MMAPv1, Stability
    • None
    • uname -a
      Linux 2.6.32-5-amd64 #1 SMP Mon Jan 16 16:22:28 UTC 2012 x86_64 GNU/Linux
    • Storage Execution
    • Linux

    Description

      after doing a db.repair one shard restarted ok, but the other doesn't

      the logs after running db.repairDatabase()
      ----- mongo log ----

      Fri Apr 20 15:32:03 [conn26] command admin.$cmd command:

      { serverStatus: 1 }

      ntoreturn:1 reslen:1337 477ms
      13537900/28932981 46%
      14795700/28932981 51%
      16063300/28932981 55%
      Fri Apr 20 15:32:33 [conn4] command admin.$cmd command:

      { serverStatus: 1 }

      ntoreturn:1 reslen:1337 177ms

      ---------------------
      ----- syslog --------
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.810663] lowmem_reserve[]: 0 0 0 0
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.810667] Node 0 DMA: 2*4kB 1*8kB 1*16kB 1*32kB 2*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 7872kB
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.810680] Node 0 DMA32: 1006*4kB 0*8kB 0*16kB 2*32kB 2*64kB 2*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 8056kB
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.810691] 2510 total pagecache pages
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.810693] 674 pages in swap cache
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.810695] Swap cache stats: add 127526, delete 126852, find 43057/43736
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.810697] Free swap = 0kB
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.810699] Total swap = 471032kB
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.814451] 1048576 pages RAM
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.814451] 20309 pages reserved
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.814451] 3710 pages shared
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.814451] 1022293 pages non-shared
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.814451] Out of memory: kill process 26171 (mongod) score 158123 or a child
      Apr 20 15:32:42 saas-di0017 kernel: [3050891.814451] Killed process 26171 (mongod)
      ---------------------

      the logs after trying to run "mongod restart"
      ----- mongo log ----

              • SERVER RESTARTED *****

      Fri Apr 20 19:28:42 [initandlisten] MongoDB starting : pid=10584 port=27018 dbpath=/var/lib/mongodb 64-bit host=saas-di0017
      Fri Apr 20 19:28:42 [initandlisten] db version v2.0.4, pdfile version 4.5
      Fri Apr 20 19:28:42 [initandlisten] git version: nogitversion
      Fri Apr 20 19:28:42 [initandlisten] build info: Linux hm4317 2.6.32-5-amd64 #1 SMP Mon Oct 3 03:59:20 UTC 2011 x86_64 BOOST_LIB_VERSION=1_42
      Fri Apr 20 19:28:42 [initandlisten] options:

      { config: "/etc/mongodb.conf", dbpath: "/var/lib/mongodb", logappend: "true", logpath: "/var/log/mongodb/mongodb.log", repair: true, rest: "true", shardsvr: "true" }

      Fri Apr 20 19:28:42 [initandlisten] journal dir=/var/lib/mongodb/journal
      Fri Apr 20 19:28:42 [initandlisten] recover begin
      Fri Apr 20 19:28:42 [initandlisten] recover lsn: 329715219
      Fri Apr 20 19:28:42 [initandlisten] recover /var/lib/mongodb/journal/j._13
      Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:328530229 < lsn:329715219
      Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:328589489 < lsn:329715219
      Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:328648739 < lsn:329715219
      Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:329004279 < lsn:329715219
      Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:329063519 < lsn:329715219
      Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:329122759 < lsn:329715219
      Fri Apr 20 19:28:42 [initandlisten] recover /var/lib/mongodb/journal/j._14
      Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:329596739 < lsn:329715219
      Fri Apr 20 19:28:42 [initandlisten] recover skipping application of section seq:329655969 < lsn:329715219
      Fri Apr 20 19:28:42 [initandlisten] exception during recovery
      Fri Apr 20 19:28:42 [initandlisten] exception in initAndListen std::exception: boost::filesystem::file_size: No such file or directory: "/var/lib/mongodb/$tmp_repairDatabase_0/emailmarketing_development.11", terminating
      Fri Apr 20 19:28:42 dbexit:
      Fri Apr 20 19:28:42 [initandlisten] shutdown: going to close listening sockets...
      Fri Apr 20 19:28:42 [initandlisten] shutdown: going to flush diaglog...
      Fri Apr 20 19:28:42 [initandlisten] shutdown: going to close sockets...
      Fri Apr 20 19:28:42 [initandlisten] shutdown: waiting for fs preallocator...
      Fri Apr 20 19:28:42 [initandlisten] shutdown: lock for final commit...
      Fri Apr 20 19:28:42 [initandlisten] shutdown: final commit...
      Fri Apr 20 19:28:42 [initandlisten] shutdown: closing all files...
      Fri Apr 20 19:28:42 [initandlisten] closeAllFiles() finished
      Fri Apr 20 19:28:42 [initandlisten] shutdown: removing fs lock...
      Fri Apr 20 19:28:42 dbexit: really exiting now

      Attachments

        1. mongodb.log.tar.gz
          1.11 MB
        2. mongodb.log.tar.gz
          1.11 MB

        Activity

          People

            backlog-server-execution Backlog - Storage Execution Team
            fabioperrella fabio perrella
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: