[SERVER-62272] Adding schema validation to a collection can prevent chunk migrations of failing documents Created: 24/Dec/21  Updated: 29/Oct/23  Resolved: 31/May/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.1.0, 4.4.10
Fix Version/s: 4.4.15, 5.0.10, 4.2.22, 6.0.0-rc9, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Alex Bevilacqua Assignee: Nandini Bhartiya
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
Related
related to SERVER-62310 collMod command not sent to all shard... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.0, v4.4, v4.2
Sprint: Sharding 2022-05-02, Sharding NYC 2022-05-16, Sharding NYC 2022-05-30, Sharding NYC 2022-06-13
Participants:
Case:
Linked BF Score: 61
Story Points: 3

 Description   

When Schema Validation is added to a collection then that collection is sharded, if any documents existed that would fail validation chunk migration will also fail.

For example:

# setup
m 4.4.10-ent
mlaunch init --replicaset --nodes 3 --shards 2 --binarypath $(m bin 4.4.10-ent)

Using the above sharded cluster we will create a collection with some junk data, then add a validation rule to ensure no further documents would be written using the same schema:

function junk(length) {
  var result           = '';
  var characters       = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  var charactersLength = characters.length;
  for ( var i = 0; i < length; i++ ) {
     result += characters.charAt(Math.floor(Math.random() * charactersLength));
  }
  return result;
}
 
var data = []
for (var i = 0; i < 50; i++) { 
  data.push({ i: i, d: junk(250 * 1024) });
}
db.getSiblingDB("test").c3.insertMany(data);
db.getSiblingDB("test").runCommand({
  collMod: "c3",
  validator: { $jsonSchema: {
     bsonType: "object",
     properties: {
        i: {
           bsonType: "string",
           description: "must be a string"
        },        
     }
  } }  
})

If we try to write another document it should fail:

db.getSiblingDB("test").c3.insert({ i: 1, d: junk(128) });
/* 
WriteResult({
	"nInserted" : 0,
	"writeError" : {
		"code" : 121,
		"errmsg" : "Document failed validation"
	}
})
*/

If the collection was then sharded, sh.status() would simply show that no chunks were being migrated:

// ensure lots of chunks
db.getSiblingDB("config").settings.save( { _id:"chunksize", value: 1 } );
sh.enableSharding("test");
sh.shardCollection("test.c3", { _id: 1 } )

Reviewing the other shard's PRIMARY log we can see the following:

{"t":{"$date":"2021-12-24T14:46:38.501-05:00"},"s":"I",  "c":"MIGRATE",  "id":22000,   "ctx":"migrateThread","msg":"Starting receiving end of chunk migration","attr":{"chunkMin":{"_id":{"$minKey":1}},"chunkMax":{"_id":{"$oid":"61c6233d8715b5d2ac06168a"}},"namespace":"test.c3","fromShard":"shard01","epoch":{"$oid":"61c623657acfa1a7fd9bdaf7"},"sessionId":"shard01_shard02_61c6239e7acfa1a7fd9bdf60","migrationId":{"uuid":{"$uuid":"83e10e76-bd8d-4e65-8f5f-63755b6b8f49"}}}}
{"t":{"$date":"2021-12-24T14:46:38.575-05:00"},"s":"I",  "c":"MIGRATE",  "id":21999,   "ctx":"chunkInserter","msg":"Batch insertion failed","attr":{"error":"DocumentValidationFailure: Insert of { _id: ObjectId('61c6233d8715b5d2ac061688'), i: 0.0, d: \"w0HKLtlN64tsZfOs2OnSb0eBGMIhjkPwSja6Td3hp5hMIPueIAgKV336r7KRGrnPwsXW2iItnPK94ccitWPPFWF3ogSMznGOFqCwJVR6EGpOD9R6d79f7Ztdbsl7bSNRv9J29mfDLlXLYRUDa8OTLv...\" } failed. :: caused by :: Document failed validation"}}
{"t":{"$date":"2021-12-24T14:46:38.592-05:00"},"s":"I",  "c":"SHARDING", "id":22080,   "ctx":"migrateThread","msg":"About to log metadata event","attr":{"namespace":"changelog","event":{"_id":"Alexs-MacBook-Pro.local:27021-2021-12-24T14:46:38.592-05:00-61c6239e1bda207efdb22431","server":"Alexs-MacBook-Pro.local:27021","shard":"shard02","clientAddr":"","time":{"$date":"2021-12-24T19:46:38.592Z"},"what":"moveChunk.to","ns":"test.c3","details":{"min":{"_id":{"$minKey":1}},"max":{"_id":{"$oid":"61c6233d8715b5d2ac06168a"}},"step 1 of 7":1,"step 2 of 7":0,"step 3 of 7":55,"to":"shard02","from":"shard01","note":"aborted"}}}}
{"t":{"$date":"2021-12-24T14:46:38.614-05:00"},"s":"I",  "c":"MIGRATE",  "id":21998,   "ctx":"migrateThread","msg":"Error during migration","attr":{"error":"migrate failed: Location51008: _migrateClone failed:  :: caused by :: operation was interrupted"}}

This can be addressed by logging into each shard's PRIMARY and running collMod to set validationLevel: "off", however as these moveChunk.error aren't logged in the changelog it's not obvious what has failed without a deeper dive.



 Comments   
Comment by Githook User [ 21/Jun/22 ]

Author:

{'name': 'nandinibhartiyaMDB', 'email': 'nandini.bhartiya@mongodb.com', 'username': 'nandinibhartiyaMDB'}

Message: SERVER-62272: Configure the failpoint at the correct position
Branch: v4.2
https://github.com/mongodb/mongo/commit/ffb583eb93ca33c34f9144900ffe9fe2cdeca857

Comment by Githook User [ 09/Jun/22 ]

Author:

{'name': 'nandinibhartiyaMDB', 'email': 'nandini.bhartiya@mongodb.com', 'username': 'nandinibhartiyaMDB'}

Message: SERVER-62272 : Migration OK for chunks existing before schema validator

(cherry picked from commit bc1ac6fb2eb66202d1acc08d158b102c57beabbd)
Branch: v4.2
https://github.com/mongodb/mongo/commit/2f1e8dc34c7ddfa1c9dccab011759a12532c8219

Comment by Githook User [ 08/Jun/22 ]

Author:

{'name': 'nandinibhartiyaMDB', 'email': 'nandini.bhartiya@mongodb.com', 'username': 'nandinibhartiyaMDB'}

Message: SERVER-62272 : Migration OK for chunks existing before schema validator
Branch: v4.4
https://github.com/mongodb/mongo/commit/224c25e0d3073978ae531c47ace605affae02664

Comment by Githook User [ 07/Jun/22 ]

Author:

{'name': 'nandinibhartiyaMDB', 'email': '104035932+nandinibhartiyaMDB@users.noreply.github.com', 'username': 'nandinibhartiyaMDB'}

Message: SERVER-62272 : Migration OK for chunks existing before schema validator
Branch: v5.0
https://github.com/mongodb/mongo/commit/b0fd7b268a420fc79b7ab4ca358fdbffa434fbda

Comment by Githook User [ 01/Jun/22 ]

Author:

{'name': 'nandinibhartiyaMDB', 'email': '104035932+nandinibhartiyaMDB@users.noreply.github.com', 'username': 'nandinibhartiyaMDB'}

Message: SERVER-62272 : Migration OK for chunks existing before schema validator

(cherry picked from commit bc1ac6fb2eb66202d1acc08d158b102c57beabbd)
Branch: v6.0
https://github.com/mongodb/mongo/commit/2657601c257c68aa95c0b5ce469db9196de8cb80

Comment by Githook User [ 31/May/22 ]

Author:

{'name': 'nandinibhartiyaMDB', 'email': '104035932+nandinibhartiyaMDB@users.noreply.github.com', 'username': 'nandinibhartiyaMDB'}

Message: SERVER-62272 : Migration OK for chunks existing before schema validator
Branch: master
https://github.com/mongodb/mongo/commit/bc1ac6fb2eb66202d1acc08d158b102c57beabbd

Comment by Alex Bevilacqua [ 29/Dec/21 ]

I've also opened SERVER-62310 to address the collMod command specifically not being sent to the other shards.

Generated at Thu Feb 08 05:54:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.