Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8594

Killing parallel M/R jobs causes temporary collection corruption

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Blocker - P1 Blocker - P1
    • 2.4.0-rc2
    • Affects Version/s: None
    • Component/s: Storage
    • Labels:
      None
    • Environment:
      All
    • ALL
    • Hide

      Execute attached MapReduce script, and kill mongod as soon as a few emit progress messages show up:

      Fri Feb 15 13:56:52.000 [conn5] 		M/R: (1/3) Emit Progress: 1600/5000	32%
      Fri Feb 15 13:56:52.005 [conn3] 		M/R: (1/3) Emit Progress: 1700/5000	34%
      Fri Feb 15 13:56:52.005 [conn4] 		M/R: (1/3) Emit Progress: 1700/5000	34%
      
      Show
      Execute attached MapReduce script, and kill mongod as soon as a few emit progress messages show up: Fri Feb 15 13:56:52.000 [conn5] M/R: (1/3) Emit Progress: 1600/5000 32% Fri Feb 15 13:56:52.005 [conn3] M/R: (1/3) Emit Progress: 1700/5000 34% Fri Feb 15 13:56:52.005 [conn4] M/R: (1/3) Emit Progress: 1700/5000 34%

      Bisected down to commit f62bc8177b52420e7dc4d0086b6902ccd188f725.

      When running parallel M/R jobs and killing mongod via C-c, mongod fails to start up due to an invalid record in a temp collection

      Fri Feb 15 14:03:34.857 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
      Fri Feb 15 14:03:34.857 [initandlisten] db version v2.4.0-rc1-pre-, pdfile version 4.5
      Fri Feb 15 14:03:34.857 [initandlisten] git version: ea4b79b64fc4b7fd80aa5b4ec171ac8a37190801
      Fri Feb 15 14:03:34.857 [initandlisten] build info: Darwin leaf.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 BOOST_LIB_VERSION=1_49
      Fri Feb 15 14:03:34.857 [initandlisten] allocator: tcmalloc
      Fri Feb 15 14:03:34.857 [initandlisten] options: {}
      Fri Feb 15 14:03:34.860 [initandlisten] journal dir=/data/db/journal
      Fri Feb 15 14:03:34.860 [initandlisten] recover : no journal files present, no recovery needed
      Fri Feb 15 14:03:34.900 [initandlisten] Assertion: 10334:BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: name: "test.tmp.mr.foo_2_inc.$0_1"
      0x10a776405 0x10a73983b 0x10a717a43 0x10a717c5a 0x109ffba58 0x10a510eec 0x109f4fa8c 0x109fefbb7 0x10a109c66 0x10a119c85 0x10a11d175 0x10a04625e 0x10a0465b5 0x109e7f795 0x109e82d6e 0x109e83488 0x109e8f480 0x109e8f548 0x109e7a8b4 
       0   mongod                              0x000000010a776405 _ZN5mongo15printStackTraceERSo + 37
       1   mongod                              0x000000010a73983b _ZN5mongo10logContextEPKc + 123
       2   mongod                              0x000000010a717a43 _ZN5mongo11msgassertedEiPKc + 339
       3   mongod                              0x000000010a717c5a _ZNK5mongo13ExceptionInfo6appendERNS_14BSONObjBuilderEPKcS4_ + 0
       4   mongod                              0x0000000109ffba58 _ZNK5mongo7BSONObj14_assertInvalidEv + 984
       5   mongod                              0x000000010a510eec _ZN5mongo7BSONObj4initEPKc + 76
       6   mongod                              0x0000000109f4fa8c _ZN5mongo7BSONObjC1EPKc + 60
       7   mongod                              0x0000000109fefbb7 _ZN5mongo7BSONObj4makeEPKNS_6RecordE + 55
       8   mongod                              0x000000010a109c66 _ZN5mongo11BasicCursor7currentEv + 66
       9   mongod                              0x000000010a119c85 _ZN5mongo8Database19clearTmpCollectionsEv + 397
       10  mongod                              0x000000010a11d175 _ZN5mongo14DatabaseHolder11getOrCreateERKSsS2_Rb + 1993
       11  mongod                              0x000000010a04625e _ZN5mongo6Client7Context11_finishInitEv + 210
       12  mongod                              0x000000010a0465b5 _ZN5mongo6Client7ContextC1ERKSsS3_b + 225
       13  mongod                              0x0000000109e7f795 _ZN5mongoL30repairDatabasesAndCheckVersionEv + 725
       14  mongod                              0x0000000109e82d6e _ZN5mongo14_initAndListenEi + 5262
       15  mongod                              0x0000000109e83488 _ZN5mongo13initAndListenEi + 24
       16  mongod                              0x0000000109e8f480 _ZL11mongoDbMainiPPcS0_ + 784
       17  mongod                              0x0000000109e8f548 main + 40
       18  mongod                              0x0000000109e7a8b4 start + 52
      Fri Feb 15 14:03:34.915 [initandlisten] exception in initAndListen: 10334 BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: name: "test.tmp.mr.foo_2_inc.$0_1", terminating
      Fri Feb 15 14:03:34.915 dbexit: 
      

        1. invalid_bsonobj_log.txt
          142 kB
        2. mr_memleak_test.js
          1 kB

            Assignee:
            mathias@mongodb.com Mathias Stearn
            Reporter:
            benjamin.becker Ben Becker
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: