[SERVER-68852] Investigate handling of incorrect values for balancer settings Created: 16/Aug/22  Updated: 11/Dec/23  Resolved: 14/Nov/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Task Priority: Major - P3
Reporter: Allison Easton Assignee: Allison Easton
Resolution: Fixed Votes: 0
Labels: shardingemea-qw
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
is documented by DOCS-15737 [Server] Investigate changes in SERVE... Closed
Problem/Incident
Related
related to SERVER-39122 Enforce schema on system collections Closed
related to MONGOSH-1665 Stop running autosplit commands in th... Open
related to SERVER-69163 Enforce Schema on System Collections Closed
is related to SERVER-82609 Create json schema for balancer confi... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding EMEA 2022-09-05, Sharding EMEA 2022-09-19, Sharding EMEA 2022-10-03, Sharding EMEA 2022-10-17, Sharding EMEA 2022-10-31, Sharding EMEA 2022-11-14
Participants:
Linked BF Score: 35
Story Points: 2

 Description   

While investigating uses of NaN in the balancer, it came up that if the user updates the maxChunkSize to a value not accepted by the balancer (NaN, -10, etc.), the balancer and all other operations that rely on the balancer configuration setting will fail until the setting is changed. These operations are:

  • all balancer operations
  • add shard
  • chunk autosplitting and initial chunk splitting
  • manual migrations

This ticket is to investigate whether this is a problem and if there is a way to reject the write of an invalid value rather than checking it at runtime.



 Comments   
Comment by Allison Easton [ 14/Nov/22 ]

The solution taken was to enforce a schema for each possible document. An example of this schema is below, the idea is to specify each option for the _id and apply a schema based on that value. In this ticket, we created a schema with the following format:

$jsonSchema: {  oneOf: [    {"properties": {_id: {enum: ["chunksize"]}}, {value: {bsonType: "number", minimum: 1, maximum: 1024}}},     {"properties": {_id: {enum: ["balancer", "autosplit", "ReadWriteConcernDefaults", "audit"]}}}]}

This will ensure that updates/inserts where the _id is "chunksize" have a value which is a number between 1 and 1024. For _id "balancer", "autosplit", "ReadWriteConcernDefaults" and "audit", we are enforcing an empty schema which will allow any doc with that _id. Any other _id values will be rejected.

Comment by Githook User [ 14/Nov/22 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-68852 Investigate handling of incorrect values for balancer settings
Branch: master
https://github.com/mongodb/mongo/commit/1854c66a7dc76d891912356cf1646ccd4b146e8c

Comment by Allison Easton [ 20/Sep/22 ]

The config.settings collections isn't ideally formatted for enforcing schemas since the schemas are collection wide and we have different rules for each document.

The current setup of the config.settings collection is to have a document per setting item, with the _id telling which setting the document affects. For example (taken from public documentation) a cluster with a max chunk size set, and balancer and autosplitter settings would have the following documents:

{ "_id" : "chunksize", "value" : 64 }

{ "_id" : "balancer", "mode" : "full", "stopped" : false }

{ "_id" : "autosplit", "enabled" : true } 

This means we need to enforce a different schema on each document based on the value of _id. Since mongoDB uses json schema version 4, we cannot use json schema if statements nor constants.

One option to enforce a jsonschema on this collection would be to enforce a schema for each possible document. An example of this schema is below,; the idea is to specify each option for the _id and apply a schema based on that value. The problem with this approach is that every allowed _id needs to be specified. In the example below, only documents with _id "chunksize", "balancer", or "autosplit" would be allowed. We wouldn't be enforcing a schema on "balancer" or "autosplit", but the _id must be specified for it to be allowed in the collection.

$jsonSchema: {
  oneOf: [
    {"properties": {_id: {enum: ["chunksize"]}}, {value: {bsonType: "int", minimum: 1, maximum: 1024}}},
    {"properties": {_id: {enum: ["balancer", "autosplit"]}}}
  ]
}

Another option would be to change the format of the documents so that instead of having the _id be "chunksize" or "balancer" or "autosplit" we could have these as fields, so that we can enforce the schema on a document given that a certain field exists. Technically we can do this currently using the fields "value", "mode", and "enabled", but this could get confusing in the future if a new setting is added that also includes one of these fields (value doesn't seem like it is unique to chunk size settings).

 

Generated at Thu Feb 08 06:11:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.