[SERVER-19472] count() incorrect after recovery with WiredTiger Created: 17/Jul/15  Updated: 01/Jun/20  Resolved: 17/Jul/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.3, 3.0.4, 3.1.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ronan Bohan Assignee: Max Hirschhorn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-5682 Document that the size and count in c... Closed
Related
is related to SERVER-19052 Remove sizeStorer recalculations at s... Closed
Operating System: ALL
Steps To Reproduce:

Insert documents on standalone/primary:

for (i=0;i<10000000;i++){db.abc.insert({a:i,name:"abc"})}

Wait for a while (maybe 100k inserts) and 'kill -9' the mongod.

Restart process and check the stats:

replset:SECONDARY> db.abc.count()
77248
replset:SECONDARY> db.abc.find({}).toArray().length
145350
replset:SECONDARY> db.abc.validate(true)
{
	"ns" : "test.abc",
	"nrecords" : 145350,
	"nIndexes" : 1,
	"keysPerIndex" : {
		"test.abc.$_id_" : 145350
	},
	"indexDetails" : {
		"test.abc.$_id_" : {
			"valid" : true
		}
	},
	"valid" : true,
	"errors" : [ ],
	"ok" : 1
}
replset:SECONDARY> db.abc.count()
145350

Participants:

 Description   

When mongod is restarted after a hard crash (and a successful recovery) the values returned by 'db.stats.objects', 'db.<coll>.stats.count', 'db.<coll>.count()' are invalid.

Note this is not the issue of count in a sharded clusters - it applies to standalone hosts and replica sets too (though only when using WiredTiger)

It looks like the count can be reset to the correct value using for example a 'db.<coll>.validate(true)' command.

The problem appears to involve the recovery phase when the log/journal is replayed on top of the data from the last successful checkpoint.

Note: This is not an issue with data integrity. The data is recovered successfully, it's just the statistics reported by 'db.stats' and relatives which are incorrect following a hard crash/kill.



 Comments   
Comment by Ronan Bohan [ 17/Jul/15 ]

Thanks for the quick response Max. I hadn't see the DOCS ticket but it does look like this was a conscious decision (SERVER-19052).

Comment by Max Hirschhorn [ 17/Jul/15 ]

I think this is "works as designed." Previously with the wiredTiger storage engine, all collections with fewer than 10000 documents would be scanned at start-up in order to compute the accurate count and storage size information. This special case was removed as part of SERVER-19052. The count and storage size information is only persisted every 1000 writes, so an unclean shutdown does require full validation in order to restore the accuracy of those statistics. See DOCS-5682.

Generated at Thu Feb 08 03:51:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.