[DOCS-12665] Docs for SERVER-30356: Improve error reporting for validation Created: 30/Apr/19  Updated: 13/Nov/23  Resolved: 20/May/19

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 3.6.13, 4.1.11, 4.0.10, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Kay Kim (Inactive) Assignee: Kay Kim (Inactive)
Resolution: Fixed Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-30356 Improve error reporting for validation Closed
Participants:
Days since reply: 4 years, 35 weeks, 2 days ago
Epic Link: DOCS: 4.2 Server/Tools

 Description   

Description

Need changes on this page: https://docs.mongodb.com/manual/reference/command/validate/

It looks like this page still mentions a lot of MMAPV1 items, we should remove references for the following: (was done as part of DOCS-11900 (mmapv1)– can see the 4.2 page https://docs.mongodb.com/master/reference/command/validate/#output) .

  • validate.firstExtent
  • validate.lastExtent
  • validate.extentCount
  • validate.extents
  • validate.extents.*
  • validate.datasize
  • validate.lastExtentSize
  • validate.padding
  • validate.firstExtentDetails
  • validate.firstExtentDetails.*
  • validate.objectsFound
  • validate.invalidObjects
  • validate.bytesWithHeaders
  • validate.bytesWithoutHeaders
  • validate.deletedCount
  • validate.deletedSize

Changes with this patch:

  • The validate.keysPerIndex and validate.indexDetails field no longer contain the index namespaces but the index name only now (db.coll.$index -> index).
  • We added a second phase of validation. This is essentially a second passthrough of validation if any index inconsistencies were detected. In the second phase we gather more information about the problematic indexes to report them to the user calling validate. The second phase of validation will run automatically without any additional command parameters if any index inconsistencies are detected during the first phase of validation.

Two new fields were added as part of the second phase of validation.

  • validate.extraIndexEntries
    This is an array of objects of the following format:

    {
        "indexName" : "_id_",
        "recordId" : NumberLong(3),
        "indexKey" : {
       	    "_id" : ObjectId("5cc0bbfa211bf6edde58fa0a")
        }
    }
    

    The extra index entries are when we have an index entry pointing at a non-existent document. We're going to assume that the document is always correct and that this index entry should not be here.

  • validate.missingIndexEntries
    This is an array of objects of the following format:

    {
    	"indexName" : "_id_",
    	"recordId" : NumberLong(3),
    	"idKey" : ObjectId("5cc0bc47a0eaf8d81da503d0"),
    	"indexKey" : {
    		"_id" : ObjectId("5cc0bc47a0eaf8d81da503d0")
    	}
    }
    

    We only show the idKey field if we have an _id index on the collection.
    The difference between the idKey and indexKey fields are that the indexKey field contains the index spec of the indexName field, while the idKey field always contains the _id value of the document it's pointing at.

The missing index entries are when we have a document key without an index entry pointing at it.

Here are a couple of scenarios using the new validate:

Case 1 (extra index entries):

  • 5 documents and an _id on the original data.
  • Remove 2 documents with rIDs 3 & 4.

    find({}).showRecordId()
    ----
    [
    	{
    		"_id" : ObjectId("5cc0bbfa211bf6edde58fa08"),
    		"x" : 1,
    		"$recordId" : NumberLong(1)
    	},
    	{
    		"_id" : ObjectId("5cc0bbfa211bf6edde58fa09"),
    		"x" : 2,
    		"$recordId" : NumberLong(2)
    	},
    	{
    		"_id" : ObjectId("5cc0bbfa211bf6edde58fa0c"),
    		"x" : 5,
    		"$recordId" : NumberLong(5)
    	}
    ]
    ----
     
    validate({full: true}) // the two new fields still show when {full: false}. {full: true} adds the 'indexDetails' field.
    ----
    {
    	"ns" : "test.missingDoc",
    	"nInvalidDocuments" : NumberLong(0),
    	"nrecords" : 3,
    	"nIndexes" : 1,
    	"keysPerIndex" : {
    		"_id_" : 5
    	},
    	"indexDetails" : {
    		"_id_" : {
    			"valid" : false
    		}
    	},
    	"valid" : false,
    	"warnings" : [
    		"Detected 2 extra index entries."
    	],
    	"errors" : [
    		"Index with name '_id_' has inconsistencies."
    	],
    	"extraIndexEntries" : [
    		{
    			"indexName" : "_id_",
    			"recordId" : NumberLong(3),
    			"indexKey" : {
    				"_id" : ObjectId("5cc0bbfa211bf6edde58fa0a")
    			}
    		},
    		{
    			"indexName" : "_id_",
    			"recordId" : NumberLong(4),
    			"indexKey" : {
    				"_id" : ObjectId("5cc0bbfa211bf6edde58fa0b")
    			}
    		}
    	],
    	"missingIndexEntries" : [ ],
    	"advice" : "A corrupt namespace has been detected. See http://dochub.mongodb.org/core/data-recovery for recovery steps.",
    	"ok" : 1
    }
    ----
    

Case 2 (missing index entries);

  • 5 documents and an _id on the original data.
  • Remove 2 index entries for documents with rIDs 3 & 4.

    find({}).showRecordId()
    ----
    [
    	{
    		"_id" : ObjectId("5cc0bc47a0eaf8d81da503ce"),
    		"x" : 1,
    		"$recordId" : NumberLong(1)
    	},
    	{
    		"_id" : ObjectId("5cc0bc47a0eaf8d81da503cf"),
    		"x" : 2,
    		"$recordId" : NumberLong(2)
    	},
    	{
    		"_id" : ObjectId("5cc0bc47a0eaf8d81da503d0"),
    		"x" : 3,
    		"$recordId" : NumberLong(3)
    	},
    	{
    		"_id" : ObjectId("5cc0bc47a0eaf8d81da503d1"),
    		"x" : 4,
    		"$recordId" : NumberLong(4)
    	},
    	{
    		"_id" : ObjectId("5cc0bc47a0eaf8d81da503d2"),
    		"x" : 5,
    		"$recordId" : NumberLong(5)
    	}
    ]
    ----
     
    validate({full: true}) // the two new fields still show when {full: false}. {full: true} adds the 'indexDetails' field.
    ----
    {
    	"ns" : "test.missingIndexEntry",
    	"nInvalidDocuments" : NumberLong(0),
    	"nrecords" : 5,
    	"nIndexes" : 1,
    	"keysPerIndex" : {
    		"_id_" : 3
    	},
    	"indexDetails" : {
    		"_id_" : {
    			"valid" : false
    		}
    	},
    	"valid" : false,
    	"warnings" : [
    		"Detected 2 missing index entries."
    	],
    	"errors" : [
    		"Index with name '_id_' has inconsistencies."
    	],
    	"extraIndexEntries" : [ ],
    	"missingIndexEntries" : [
    		{
    			"indexName" : "_id_",
    			"recordId" : NumberLong(3),
    			"idKey" : ObjectId("5cc0bc47a0eaf8d81da503d0"),
    			"indexKey" : {
    				"_id" : ObjectId("5cc0bc47a0eaf8d81da503d0")
    			}
    		},
    		{
    			"indexName" : "_id_",
    			"recordId" : NumberLong(4),
    			"idKey" : ObjectId("5cc0bc47a0eaf8d81da503d1"),
    			"indexKey" : {
    				"_id" : ObjectId("5cc0bc47a0eaf8d81da503d1")
    			}
    		}
    	],
    	"advice" : "A corrupt namespace has been detected. See http://dochub.mongodb.org/core/data-recovery for recovery steps.",
    	"ok" : 1
    }
    

Engineering Ticket Description:

After the first phase of validation detects potential errors, we will log a message informing users that the second phase of the validation is going to be run.

The second phase of validation will consist of tracking all the KeyStrings that were hashed to erroneous buckets during the first phase. We will keep track of the document keys KeyString of the erroneous buckets and only remove them once a corresponding KeyString from an index entry is seen.

This allows us to see which document keys are missing index entries and which index entries are missing document keys once the second phase is finished going through the data.
We'll limit reporting the number of errors to be the maximum size of a document. The KeyStrings will have to be transformed back into BSON format with appropriate messages entailing what went wrong with them.

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Githook User [ 11/Jun/19 ]

Author:

{'name': 'Kay Kim', 'email': 'kay.kim@10gen.com', 'username': 'kay-kim'}

Message: DOCS-12665: 4.0.10 validate
Branch: v3.6
https://github.com/mongodb/docs/commit/1f074c7d8523c30087712e77200ce2b5bd6eefce

Comment by Githook User [ 31/May/19 ]

Author:

{'name': 'Kay Kim', 'email': 'kay.kim@10gen.com', 'username': 'kay-kim'}

Message: DOCS-12665: cleanup of api file
Branch: v4.0
https://github.com/mongodb/docs/commit/7e8926e64633320696c28f9b1ae92ce22d99e98e

Comment by Githook User [ 31/May/19 ]

Author:

{'name': 'Kay Kim', 'email': 'kay.kim@10gen.com', 'username': 'kay-kim'}

Message: DOCS-12665: 4.0.10 validate
Branch: v4.0
https://github.com/mongodb/docs/commit/32b3a82426906b62a1cf031c0333d92191d0dbac

Comment by Githook User [ 20/May/19 ]

Author:

{'email': 'kay.kim@10gen.com', 'name': 'Kay Kim', 'username': 'kay-kim'}

Message: DOCS-12665: 4.0.10 validate
Branch: v3.6.13
https://github.com/mongodb/docs/commit/6c6f947f3fa7f526d48f2af0134d101816224b22

Comment by Githook User [ 20/May/19 ]

Author:

{'name': 'Kay Kim', 'email': 'kay.kim@10gen.com', 'username': 'kay-kim'}

Message: DOCS-12665: cleanup of api file
Branch: master
https://github.com/mongodb/docs/commit/f593164040ced09607a17d002b034fb07fff100c

Comment by Githook User [ 20/May/19 ]

Author:

{'name': 'Kay Kim', 'email': 'kay.kim@10gen.com', 'username': 'kay-kim'}

Message: DOCS-12665: cleanup of api file
Branch: v4.0.10
https://github.com/mongodb/docs/commit/3c5b4be2652707b6184cdb628476ddb550134c78

Comment by Githook User [ 20/May/19 ]

Author:

{'email': 'kay.kim@10gen.com', 'name': 'Kay Kim', 'username': 'kay-kim'}

Message: DOCS-12665: 4.0.10 validate
Branch: v4.0.10
https://github.com/mongodb/docs/commit/1bc789c202428b84604b4a6557839571016eb493

Comment by Githook User [ 20/May/19 ]

Author:

{'name': 'Kay Kim', 'email': 'kay.kim@10gen.com', 'username': 'kay-kim'}

Message: DOCS-12665: 4.2 validate
Branch: master
https://github.com/mongodb/docs/commit/d2869a355362ab02a56e4ada5526c84667b9f82e

Generated at Thu Feb 08 08:05:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.