Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19472

count() incorrect after recovery with WiredTiger

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Works as Designed
    • Affects Version/s: 3.0.3, 3.0.4, 3.1.5
    • Fix Version/s: None
    • Component/s: WiredTiger
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Insert documents on standalone/primary:

      for (i=0;i<10000000;i++){db.abc.insert({a:i,name:"abc"})}
      

      Wait for a while (maybe 100k inserts) and 'kill -9' the mongod.

      Restart process and check the stats:

      replset:SECONDARY> db.abc.count()
      77248
      replset:SECONDARY> db.abc.find({}).toArray().length
      145350
      replset:SECONDARY> db.abc.validate(true)
      {
      	"ns" : "test.abc",
      	"nrecords" : 145350,
      	"nIndexes" : 1,
      	"keysPerIndex" : {
      		"test.abc.$_id_" : 145350
      	},
      	"indexDetails" : {
      		"test.abc.$_id_" : {
      			"valid" : true
      		}
      	},
      	"valid" : true,
      	"errors" : [ ],
      	"ok" : 1
      }
      replset:SECONDARY> db.abc.count()
      145350
      

      Show
      Insert documents on standalone/primary: for (i=0;i<10000000;i++){db.abc.insert({a:i,name: "abc" })} Wait for a while (maybe 100k inserts) and 'kill -9' the mongod . Restart process and check the stats: replset:SECONDARY> db.abc.count() 77248 replset:SECONDARY> db.abc.find({}).toArray().length 145350 replset:SECONDARY> db.abc.validate(true) { "ns" : "test.abc", "nrecords" : 145350, "nIndexes" : 1, "keysPerIndex" : { "test.abc.$_id_" : 145350 }, "indexDetails" : { "test.abc.$_id_" : { "valid" : true } }, "valid" : true, "errors" : [ ], "ok" : 1 } replset:SECONDARY> db.abc.count() 145350

      Description

      When mongod is restarted after a hard crash (and a successful recovery) the values returned by 'db.stats.objects', 'db.<coll>.stats.count', 'db.<coll>.count()' are invalid.

      Note this is not the issue of count in a sharded clusters - it applies to standalone hosts and replica sets too (though only when using WiredTiger)

      It looks like the count can be reset to the correct value using for example a 'db.<coll>.validate(true)' command.

      The problem appears to involve the recovery phase when the log/journal is replayed on top of the data from the last successful checkpoint.

      Note: This is not an issue with data integrity. The data is recovered successfully, it's just the statistics reported by 'db.stats' and relatives which are incorrect following a hard crash/kill.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              max.hirschhorn Max Hirschhorn
              Reporter:
              ronan.bohan Ronan Bohan
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: