[SERVER-20547] Expose the reason an operation fails document validation Created: 21/Sep/15  Updated: 13/Oct/22  Resolved: 21/Oct/20

Status: Closed
Project: Core Server
Component/s: Storage, Write Ops
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Improvement Priority: Major - P3
Reporter: Kelly Stirman Assignee: Mihai Andrei
Resolution: Done Votes: 102
Labels: QFB, asya, qexec-team, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File SERVER-20547.patch    
Issue Links:
Backports
backports GODRIVER-2042 Returning errInfo - Schema Validation Closed
Depends
Documented
is documented by DOCS-13961 Investigate changes in SERVER-20547: ... Closed
Duplicate
is duplicated by SERVER-35926 Return error message if schema valida... Closed
is duplicated by SERVER-20584 Document Validation should tell WHY a... Closed
is duplicated by SERVER-40251 Better validation error on insert and... Closed
is duplicated by SERVER-48646 JSON Schema validation error does not... Closed
is duplicated by SERVER-32093 3.6 $jsonSchema validator error messa... Closed
Related
related to SERVER-51839 Add error context when document valid... Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 2020-09-21, Query 2020-10-05, Query 2020-10-19, Query 2020-11-02
Participants:
Case:

 Description   
Issue Status as of October 21, 2020

Summary
Operations which fail document validation when the collection validator’s validationAction is set to ‘error’ will now return an error of the form:

{
	"n" : 0,
	"writeErrors" : [
		{
				“index”: 0,
  		    		"code" : 121,
          			“errInfo”: {<object describing reason(s) for failure>},
  				"errmsg" : "Document failed validation"
   		}
	],
		“ok”: 1
}

When the validationAction is set to ‘warn’, a message of the following form will be written in the logs:

{"t":{"$date": <date>},"s":"W",  "c":"STORAGE",  "id":20294,   "ctx":"conn","msg":"Document would fail validation","attr":{"namespace":<namespace>,"document":<document which failed validation>,"errInfo":<object describing the reason(s) for failure>}}

In both cases, the ‘errInfo’ field contains a structured error object that describes why a particular document failed validation.

Example
To get a sense of what these detailed errors look like, suppose the following validator expression is set on a collection named ‘students’:

 {$jsonSchema: {
	bsonType: "object",
	required: [ "name", "year", "major", "address" ],
	properties: {
		name: {
			bsonType: "string",
			description: "must be a string and is required"
		},
		year: {
			bsonType: "int",
			minimum: 2017,
			maximum: 3017,
		 	description: "must be an integer in [ 2017, 3017 ] and is required"
		},
		major: {
			enum: [ "Math", "English", "Computer Science", "History", null ],
			description: "can only be one of the enum values and is required"
		},
		gpa: {
			bsonType:  "double" ,
			description: "must be a double if the field exists"
		},
		address: {
			bsonType: "object",
			required: [ "city" ],
			properties: {
	 			street: {
					bsonType: "string",
					description: "must be a string if the field exists”
				},
				city: {
					bsonType: "string",
				"description": "must be a string and is required"
				}
			}
		}
	}
}

Attempting to insert the document {name: “Mihai”, year: 2019, major: “Computer Science”} will produce the following error under the ‘errInfo’ field:

"errInfo" : {
	"failingDocumentId" : ObjectId("5f85f78bccef9666ae09c3c1"),
		"details" : {
			"operatorName" : "$jsonSchema",
			"schemaRulesNotSatisfied" : [
				{
					"operatorName" : "properties",
					"propertiesNotSatisfied" : [
						{
							"propertyName" : "year",
							"details" : [
								{
									"operatorName" :"bsonType",
									"specifiedAs" : {
										"bsonType" : "int"
									},
									"reason" : "type did not match",
									"consideredValue" : 2019,
									"consideredType" : "double"
								}
							]
						}
					]
				},
				{
					"operatorName" : "required",
					"specifiedAs" : {
						"required" : [
							"name",
							"year",
							"major",
							"address"
						]
					},
					"missingProperties" : [
						"address"
					]
				}
			]
		}
}

The detailed error describes why the document failed validation: the value specified for the ‘year’ field was of the wrong type and the ‘address’ field is required, but was missing.

Version Information
This feature will be available starting in version 4.9.0 once the upgrade (including upgrading the FCV) is fully complete.

We will provide a link to the reference page for this feature in the online documentation when it becomes available.

Original Description

When documents fail validation, there is no feedback as to what failed.

It should be possible to perform an update or insert operation and find out what specific predicate in the validator document cause the operation to fail.



 Comments   
Comment by Katya Kamenieva [ 23/Oct/20 ]

avaly@plexapp.com this will go out in the next major release (5.0) that is planned for mid-2021

Comment by Valentin Agachi [ 22/Oct/20 ]

4.9.0? Is there any ETA for when that will be released? EOY 2021?

Comment by viktor dakalov [ 22/Oct/20 ]

We have been waiting for this for five long years! We must celebrate this! I'm treating!

Comment by David Storch [ 12/Oct/20 ]

george@qntify.co I am not aware of any current plans to backport this work to 4.4 or an earlier branch.

Comment by George Mihailov [ 11/Oct/20 ]

@david.storch thank you for the update. Do you think it will be ported to currently maintained versions or become available in 4.4+? 

Comment by David Storch [ 30/Sep/20 ]

Hi george@qntify.co and krystian.jarmicki@silvair.com: The implementation of this feature has been nearly completed, with most of the implementation already merged to the master branch. Soon, once we have added the finishing touches, we will close this ticket and provide another update.

Thanks to mihai.andrei and mindaugas.malinauskas for their work on this project!

Comment by George Mihailov [ 30/Sep/20 ]

Thanks, @Krystian, it is helpful, however, it doesn't work for aggregation pipelines. We need a proper solution for this.

Mongo, please, it has been 5 years to deliver this feature ... people now write libraries to workaround it. 

Comment by Mihai Andrei [ 09/Jul/20 ]

We are currently implementing this feature, currently targeting it for the 4.6 release. The work is being done incrementally under other tickets, and we intend to close this umbrella ticket with an update once the project has been completed in its entirety. Feel free to reach out if you have any questions.

Comment by Jérôme Bédat [ 08/Jul/20 ]

We are currently working on some pretty complexe validation schemas and it's almost impossible to debug without a clear error reporting. It's gonna take days instead of only a few hours to debug. I think it's very shocking to see this issue open since 2015 without any fix. I'm sure the fix is quiet simple so please guys do something, it's a shame to not have that in 2020. It's a shame for MongoDB.

Comment by Guillaume Brugere [ 08/Jul/20 ]

As requested by all previous comments, is it possible to prioritize this point? The lack of feedback from schema validation is a real pain for development teams.

Comment by George Mihailov [ 26/Jun/20 ]

Please prioritize this, validation without errors with details is useless. The client-side workaround doesn't work for pipelines.

Comment by Bernhardt Scherer [ 06/Mar/20 ]

This feature would be super useful as it would remove the need for a lot of validation code in the backend and creates a single source of truth for validations on database level.

Comment by Daniel Leber [ 13/Feb/20 ]

Our startup is a customer of Mongo Atlas, and we would greatly benefit from this feature.

Mongo's lack of schema allowed us to iterate and get to market quickly. As we consolidate our understanding of the data and customer needs, enforcing a schema on write will help prevent bugs and misunderstandings in our growing team.

As mentioned in other comments, the validation feature is its current state is essentially unusable.

Comment by Paul-Emile Brotons [ 12/Feb/20 ]

Customers are asking for this. I got today a new example where they needed it and had to do it on the client side, ending unhappy with the lack of this feature.

Comment by Erik Mc [ 30/Sep/19 ]

This issue is extremely frustrating.

For inserts there are some workarounds like client-side validation using external jsonSchema validators.

But for updates there is no viable workaround. You don't always have the whole document client side and even if you did you would have to locally re-implement all update-operators such as $set, $addToSet, $inc, etc just to be able to reconstruct the new document for validation.

Comment by Jim Jin [ 11/Sep/19 ]

How do I determine why a MongoDB document insert is failing validation? All I get back is a writeError that says "Document failed validation", which isn't very helpful. I wanna some detailed error info.

Comment by Alex Urdenko [ 06/Aug/19 ]

I think the problem is insecure content in the verification response. Some important data may be compromised. But it would be nice to give the field "description" at least.

Comment by DANIELE Tassone [ 14/Jul/19 ]

I agree would be great

Comment by Nikolay Symotiyk [ 20/May/19 ]

I agree. We can't use validation just only because of this problem :c

Comment by Luiz Bim [ 16/May/19 ]

Just don't make any sense provide a feature like that and not improve this at all. No validator that I ever seen show results as generic message of error without provide at least one kind of debug method.

Comment by Olivier Louvignes [ 16/May/19 ]

A huge pain when using MongoDB jsonSchema, would really love this feature as well... A bit crazy that this ticket has been opened for 4 years now...

Any progress on this? 

Comment by James J. Ye [ 03/May/19 ]

It is such a painful task to go through all the details of $jsonSchema and the document data that failed the validation to figure out why the document validation failed. This feature/enhancement would make my life much easier as a MongoDB lover.

Comment by Ahmed Medhat [ 09/Feb/19 ]

great feature but missing the most important thing "the descriptive error msg", i'm already using a work around soln by using jsonschema validator like Ajv to validate data in create or update actions but i can't validate everything in update operators, i.e i defined a field in jsonschema as a number with minimum value 1 and i need to update this field for an existing document by decrease it by 1 using $inc how can i know as this update is failed cuz of minimum lvl as you return "Document failed validation" for any validation error without mentioning the reason ?!!!

you must add descriptive error msgs for validation errors to make this feature useful

Comment by Jakub Fedyczak [ 23/Jan/19 ]

Upvoting

Comment by Alex [ 21/Dec/18 ]

It's necessary! It would be great if it were implemented!

Comment by viktor dakalov [ 19/Dec/18 ]

Guys, please do it! Otherwise this feature is useless! 

Comment by Owen Allen [ 17/Dec/18 ]

I'd like to upvote this as well. We're trying to figure out how to handle this issue in our app. It's pretty rough on developers when something fails validation because you get no feedback at all. Even if it was able to tell is some very basic elements it would be helpful. In example 'Field 'created' is required but was not provided.', 'Field 'x' should be bsontype string but was bsontype number'.

When a SQL statement fails, it just gives you the first failure it reaches, it doesn't give you each failure recursively. In general I see validation as two parts. One of them is to prevent data corruption, and the other is to assist developers. Outputting the first failure helps both and is easier than having to deal with the complexities of the nested structures. I hope you guys will prioritize this for a future release.

Comment by Sharma Abhishek [ 12/Nov/18 ]

This is the most annoying error i am seeing in Mongo DB , where in we have a feature to put the schema validation , but the validation does not tell us whats wrong in the error response.

This issue i am seeing from 3.2 till 3.8 current version i am using ,please fix this asap

Comment by Nathan Bolam [ 01/Oct/18 ]

nice work @kyle.suarez

as it converts the jsonSchema to a mongo expression to test the data against i agree with the "best-effort" sort of basis atleast its a start on this 3 year old ticket

Comment by Kyle Suarez [ 07/Sep/18 ]

I've taken a whack at this problem during our latest engineering Skunkworks. In my approach, I updated the MatchExpression matching API to take an "explain" argument, which will be filled out when the match does not succeed. I have the code in one of my forks, with the commit of interest here. I also uploaded SERVER-20547.patch, based on commit 4c4c76ec1be5edb1d7560aecfe4ac7f5ada7ba00. This isn't yet a complete, production-ready solution, but it does demonstrate one way in which we might expose more detail about why a document doesn't match.

Sample behavior with this patch:

> db.users.runCommand(
   "create",
   {validator: {
     $and: [
       {name: /^k/},
       {age: {$gt: 0}}
     ]
   }
});
{ "ok" : 1 }
 
> db.users.insert({name: "b", age: 4})
WriteResult({
  nInserted : 0,
  writeError : {
    code : 121,
    errmsg : "Document failed validation: clause 0 of $and failed: b does not match {regex: ^k, flags: }"
  }
})

Again, this is only a sample, and I didn't implement "explain" output for all possible expressions.

I'll note that there are challenges with this approach:

  • JSON Schema expressions do not have a one-to-one correspondence with the MatchExpressions that implement them. This makes it hard to give sensible JSON Schema error messages, as the MatchExpression is currently unaware that it is actually part of a JSON Schema construction.
  • Currently, my WIP will add to the explain when a match fails. That behavior is great for "positive" expressions but less useful for "negative" expressions like $not.
  • We'd probably have to implement some sort of backtracking for an $or clause. If clauses 0 and 1 don't match but clause 2 does, then we should (probably?) discard the explain output produced by the first two clauses.
  • The output produced by the explain is simply a one-dimensional string. For extremely large nested expressions, this may be difficult to parse or interpret. It is definitely a challenge to express details about a two-dimensional AST in a succinct fashion.
  • We must be mindful of the amount of memory used by the "explain" buffer and ensure it doesn't exceed some predetermined capacity.

Given this, any solution that we come up with will likely be on a "best-effort" sort of basis, as it is difficult to determine the exact cause of failure with our current AST.

Kyle

Comment by J. Jansen [ 21/Aug/18 ]

I have almost the same problem as Nathan Bolam. Only I use the old schema validator and I work on a PHP application. No problems with inserting via the mongoshell. When inserting from my PHP application MongoDB reports that the insert was successful, only the document is NOT inserted in the database.  

Comment by Nathan Bolam [ 12/Aug/18 ]

It’s even more frustrating when your schema and data validate correctly with ajv and jsonschemavalidator yet fail to validate correctly against MongoDB also only find that inserting the document through mongoshell validates correctly but inputting the same dataset through the node driver doesn’t which only leaves you with removing 1 field at a time to try and work out why. 

Comment by Jeffrey Zimmerman [ 29/Jun/18 ]

It's incredible how much the error "Document fails validation" costs us when we hit this problem.  Please tell us why it failed validation!

Comment by Hendy Irawan [ 24/May/18 ]

@wintersieck wrote a package to help with Mongo’s lack of error messages: https://www.npmjs.com/package/mongo-schemer_. It runs all schema validation errors through_ ajv to provide a detailed explanation of why the document failed validation. (source: https://medium.com/@wintersieck/json-schema-validation-in-mongo-3-6-e8def43f1645 )

Comment by Eric Milkie [ 13/Feb/17 ]

There is no request you can run to find out what rules failed. As the Description indicates, this Improvement ticket is to provide the ability to do that.

Comment by Michael Henretty [ 12/Feb/17 ]

> I think it is ok if a separate request is required to find out what rules failed.

What request can we run to find out what rules failed? AFAICT, running getLastError does not yield any more information about what rules failed.

Comment by Jonathan Rezende [ 09/Oct/16 ]

Agreed with @steven, it is equally frustrating as a report ticket with message "it is not working"

Comment by Steven Samuel Cole [ 04/Oct/16 ]

The current behavior is the equivalent of a bug report saying "it doesn't work" and is equally frustrating.
I'm surprised a useless error message such as Document failed validation has even made it into production.

Comment by Chris Handorf [ 28/Sep/16 ]

Its OK if a separate request is required to find out what rules failed, but what is the separate request? db.runCommand(

{ getLastError: 1}

) returns the same generic validation error message.

When a document has many fields and many validation rules using regular expressions it becomes very time consuming to determine what is failing. Even recording something in the log would be a big help.

Comment by Henrik Ingo (Inactive) [ 27/Sep/15 ]

Kelly: In that case the error message must include a sentence informing the user about that other command. (Also, off the bat I doubt you will save anything by making it a separate command, but in theory of course you could be right.)

Comment by Kelly Stirman [ 23/Sep/15 ]

I don't think this needs to be in the critical path - I think it is ok if a separate request is required to find out what rules failed.

Generated at Thu Feb 08 03:54:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.