[SERVER-7161] Sharding will fail with non obvious error when locks collection is not consistent Created: 26/Sep/12  Updated: 10/Dec/14  Resolved: 02/May/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.7
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Andre de Frere Assignee: Greg Studer
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-13616 "type 7" (OID) error when acquiring d... Closed
Related
Operating System: ALL
Participants:

 Description   

If a locks collection is inconsistent across three config servers, shards will fail to be balanced. In the case where one shard has the lock entry but one or both of the others do not, a message similar to the following will appear in the logs:

[Balancer] caught exception while doing balance: distributed lock balancer/ip-<ip>:<port>:1347910582:1804289383 had errors communicating with individual server <server>:<port> :: caused by :: field not found, expected type 7

expected type 7 refers to the ObjectId that is missing from the locks collection within the affect shard key.

The balancer lock will be forced when it times out, with the following messages:

[Balancer] forcing lock 'balancer/ip-<ip>:<port>:1347690492:1804289383' because elapsed time 900364 > takeover time 900000
[Balancer] warning: lock forcing balancer/ip-<ip>:<port>:1347690492:1804289383 inconsistent
[Balancer] lock 'balancer/ip-<ip>:<port>:1347690492:1804289383' successfully forced

Which indicates the locks is both successfully forced and inconsistent. However, no shard balancing will take place.

Potentially the message could be more obvious ("Lock not found"), or the Lock should be successfully forced as reported.


Generated at Thu Feb 08 03:13:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.