Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-12665

Docs for SERVER-30356: Improve error reporting for validation

      Description

      Need changes on this page: https://docs.mongodb.com/manual/reference/command/validate/

      It looks like this page still mentions a lot of MMAPV1 items, we should remove references for the following: (was done as part of DOCS-11900 (mmapv1)– can see the 4.2 page https://docs.mongodb.com/master/reference/command/validate/#output) .

      • validate.firstExtent
      • validate.lastExtent
      • validate.extentCount
      • validate.extents
      • validate.extents.*
      • validate.datasize
      • validate.lastExtentSize
      • validate.padding
      • validate.firstExtentDetails
      • validate.firstExtentDetails.*
      • validate.objectsFound
      • validate.invalidObjects
      • validate.bytesWithHeaders
      • validate.bytesWithoutHeaders
      • validate.deletedCount
      • validate.deletedSize

      Changes with this patch:

      • The validate.keysPerIndex and validate.indexDetails field no longer contain the index namespaces but the index name only now (db.coll.$index -> index).
      • We added a second phase of validation. This is essentially a second passthrough of validation if any index inconsistencies were detected. In the second phase we gather more information about the problematic indexes to report them to the user calling validate. The second phase of validation will run automatically without any additional command parameters if any index inconsistencies are detected during the first phase of validation.

      Two new fields were added as part of the second phase of validation.

      • validate.extraIndexEntries
        This is an array of objects of the following format:
        {
            "indexName" : "_id_",
            "recordId" : NumberLong(3),
            "indexKey" : {
           	    "_id" : ObjectId("5cc0bbfa211bf6edde58fa0a")
            }
        }
        

        The extra index entries are when we have an index entry pointing at a non-existent document. We're going to assume that the document is always correct and that this index entry should not be here.

      • validate.missingIndexEntries
        This is an array of objects of the following format:
        {
        	"indexName" : "_id_",
        	"recordId" : NumberLong(3),
        	"idKey" : ObjectId("5cc0bc47a0eaf8d81da503d0"),
        	"indexKey" : {
        		"_id" : ObjectId("5cc0bc47a0eaf8d81da503d0")
        	}
        }
        

        We only show the idKey field if we have an _id index on the collection.
        The difference between the idKey and indexKey fields are that the indexKey field contains the index spec of the indexName field, while the idKey field always contains the _id value of the document it's pointing at.

      The missing index entries are when we have a document key without an index entry pointing at it.

      Here are a couple of scenarios using the new validate:

      Case 1 (extra index entries):

      • 5 documents and an _id on the original data.
      • Remove 2 documents with rIDs 3 & 4.
        find({}).showRecordId()
        ----
        [
        	{
        		"_id" : ObjectId("5cc0bbfa211bf6edde58fa08"),
        		"x" : 1,
        		"$recordId" : NumberLong(1)
        	},
        	{
        		"_id" : ObjectId("5cc0bbfa211bf6edde58fa09"),
        		"x" : 2,
        		"$recordId" : NumberLong(2)
        	},
        	{
        		"_id" : ObjectId("5cc0bbfa211bf6edde58fa0c"),
        		"x" : 5,
        		"$recordId" : NumberLong(5)
        	}
        ]
        ----
        
        validate({full: true}) // the two new fields still show when {full: false}. {full: true} adds the 'indexDetails' field.
        ----
        {
        	"ns" : "test.missingDoc",
        	"nInvalidDocuments" : NumberLong(0),
        	"nrecords" : 3,
        	"nIndexes" : 1,
        	"keysPerIndex" : {
        		"_id_" : 5
        	},
        	"indexDetails" : {
        		"_id_" : {
        			"valid" : false
        		}
        	},
        	"valid" : false,
        	"warnings" : [
        		"Detected 2 extra index entries."
        	],
        	"errors" : [
        		"Index with name '_id_' has inconsistencies."
        	],
        	"extraIndexEntries" : [
        		{
        			"indexName" : "_id_",
        			"recordId" : NumberLong(3),
        			"indexKey" : {
        				"_id" : ObjectId("5cc0bbfa211bf6edde58fa0a")
        			}
        		},
        		{
        			"indexName" : "_id_",
        			"recordId" : NumberLong(4),
        			"indexKey" : {
        				"_id" : ObjectId("5cc0bbfa211bf6edde58fa0b")
        			}
        		}
        	],
        	"missingIndexEntries" : [ ],
        	"advice" : "A corrupt namespace has been detected. See http://dochub.mongodb.org/core/data-recovery for recovery steps.",
        	"ok" : 1
        }
        ----
        

      Case 2 (missing index entries);

      • 5 documents and an _id on the original data.
      • Remove 2 index entries for documents with rIDs 3 & 4.
        find({}).showRecordId()
        ----
        [
        	{
        		"_id" : ObjectId("5cc0bc47a0eaf8d81da503ce"),
        		"x" : 1,
        		"$recordId" : NumberLong(1)
        	},
        	{
        		"_id" : ObjectId("5cc0bc47a0eaf8d81da503cf"),
        		"x" : 2,
        		"$recordId" : NumberLong(2)
        	},
        	{
        		"_id" : ObjectId("5cc0bc47a0eaf8d81da503d0"),
        		"x" : 3,
        		"$recordId" : NumberLong(3)
        	},
        	{
        		"_id" : ObjectId("5cc0bc47a0eaf8d81da503d1"),
        		"x" : 4,
        		"$recordId" : NumberLong(4)
        	},
        	{
        		"_id" : ObjectId("5cc0bc47a0eaf8d81da503d2"),
        		"x" : 5,
        		"$recordId" : NumberLong(5)
        	}
        ]
        ----
        
        validate({full: true}) // the two new fields still show when {full: false}. {full: true} adds the 'indexDetails' field.
        ----
        {
        	"ns" : "test.missingIndexEntry",
        	"nInvalidDocuments" : NumberLong(0),
        	"nrecords" : 5,
        	"nIndexes" : 1,
        	"keysPerIndex" : {
        		"_id_" : 3
        	},
        	"indexDetails" : {
        		"_id_" : {
        			"valid" : false
        		}
        	},
        	"valid" : false,
        	"warnings" : [
        		"Detected 2 missing index entries."
        	],
        	"errors" : [
        		"Index with name '_id_' has inconsistencies."
        	],
        	"extraIndexEntries" : [ ],
        	"missingIndexEntries" : [
        		{
        			"indexName" : "_id_",
        			"recordId" : NumberLong(3),
        			"idKey" : ObjectId("5cc0bc47a0eaf8d81da503d0"),
        			"indexKey" : {
        				"_id" : ObjectId("5cc0bc47a0eaf8d81da503d0")
        			}
        		},
        		{
        			"indexName" : "_id_",
        			"recordId" : NumberLong(4),
        			"idKey" : ObjectId("5cc0bc47a0eaf8d81da503d1"),
        			"indexKey" : {
        				"_id" : ObjectId("5cc0bc47a0eaf8d81da503d1")
        			}
        		}
        	],
        	"advice" : "A corrupt namespace has been detected. See http://dochub.mongodb.org/core/data-recovery for recovery steps.",
        	"ok" : 1
        }
        

      Engineering Ticket Description:

      After the first phase of validation detects potential errors, we will log a message informing users that the second phase of the validation is going to be run.

      The second phase of validation will consist of tracking all the KeyStrings that were hashed to erroneous buckets during the first phase. We will keep track of the document keys KeyString of the erroneous buckets and only remove them once a corresponding KeyString from an index entry is seen.

      This allows us to see which document keys are missing index entries and which index entries are missing document keys once the second phase is finished going through the data.
      We'll limit reporting the number of errors to be the maximum size of a document. The KeyStrings will have to be transformed back into BSON format with appropriate messages entailing what went wrong with them.

      Scope of changes

      Impact to Other Docs

      MVP (Work and Date)

      Resources (Scope or Design Docs, Invision, etc.)

            Assignee:
            kay.kim@mongodb.com Kay Kim (Inactive)
            Reporter:
            kay.kim@mongodb.com Kay Kim (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:
              4 years, 46 weeks ago