Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17506

Race between inserts and checkpoints can lose records under WiredTiger

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical - P2
    • Resolution: Fixed
    • 3.0.0
    • 3.0.1, 3.1.0
    • WiredTiger
    • Fully Compatible
    • ALL

    Description

      Repro:

      collections=1000
      threads=1
       
      function checkpoint {
          echo checkpointing
          mongo --quiet --eval "printjson(db.adminCommand({fsync:1}))"
      }
       
      function setup {
          killall -w mongod
          rm -rf db; mkdir -p db
          mongod --dbpath db --logpath db.log --fork --storageEngine wiredTiger
          sleep 2
          echo initializing $collections collections
          mongo --quiet --eval "
              for (var i=0; i<$collections; i++)
                  db['c'+i].insert({})
          "
          checkpoint
      }
       
      function repro {
          (
              for t in $(seq 0 $[$threads-1]); do
                  echo inserting $t
                  mongo --quiet --eval "
                      for (var i=$t; i<$collections; i+=$threads)
                          db['c'+i].insert({})
                  " &
              done
              wait
              echo finished inserts
          ) &
          checkpoint &
          wait
          echo finished parallel checkpoint
          checkpoint
          echo finished sequential checkpoint        
      }
       
      function check {
          echo checking
          mongo --quiet --eval "
              for (var i=0; i<$collections; i++) {
                  try {
                      c = db['c'+i]
                      n_id = c.find().hint({_id:1}).itcount()
                      n_c = c.find().hint({\$natural:1}).itcount()
                      if (n_id != n_c)
                          print(i, n_c, n_id, 'OOPS')
                  } catch (e) {
                      print(i, e)
                  }
              }
          "
      }
       
      setup
      while true; do
          repro
          echo stopping
          killall -w mongod
          echo starting
          mongod --dbpath db --logpath db.log --logappend --fork --storageEngine wiredTiger
          sleep 2
          check
      done
      

      Produces following errors from app on second time through the loop:

      27 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      28 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      29 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      30 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      31 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      32 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      33 Error: error: { "$err" : "Didn't find RecordId in WiredTigerRecordStore", "code" : 28556 }
      ...
      

      And the following in the mongod log:

      2015-03-07T12:53:31.671-0500 E STORAGE  WiredTiger (0) [1425750811:671913][9925:0x7f3a93725700], eviction-server: cache server: exiting with 420 pages in memory and 416 pages evicted
      2015-03-07T12:53:31.671-0500 E STORAGE  WiredTiger (0) [1425750811:671962][9925:0x7f3a93725700], eviction-server: cache server: exiting with 1640 bytes in memory
      2015-03-07T12:53:31.671-0500 E STORAGE  WiredTiger (0) [1425750811:671976][9925:0x7f3a93725700], eviction-server: cache server: exiting with 1058 bytes dirty and 2 pages dirty
      

      2015-03-07T12:53:35.799-0500 I -        [conn1] Assertion: 28556:Didn't find RecordId in WiredTigerRecordStore
      

      Attachments

        Issue Links

          Activity

            People

              michael.cahill@mongodb.com Michael Cahill (Inactive)
              bruce.lucas@mongodb.com Bruce Lucas (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: