[SERVER-45389] Add metrics tracking how often shards have inconsistent indexes Created: 07/Jan/20  Updated: 29/Oct/23  Resolved: 06/Feb/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.2.6, 4.3.4

Type: Task Priority: Major - P3
Reporter: Jack Mulrow Assignee: Cheahuychou Mao
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Documented
is documented by DOCS-13398 Investigate changes in SERVER-45389: ... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2
Sprint: Sharding 2020-01-13, Sharding 2020-01-27, Sharding 2020-02-10
Participants:

 Description   

Metrics should be added that can be used to determine when shards do not all have the same indexes for a sharded collection.



 Comments   
Comment by Githook User [ 27/Mar/20 ]

Author:

{'name': 'Cheahuychou Mao', 'username': 'cheahuychou', 'email': 'cheahuychou.mao@mongodb.com'}

Message: SERVER-45389 Add metrics tracking how often shards have inconsistent indexes

(cherry picked from commit ea696eb7a27f18c21223a3ff94d9124f06698af5)

SERVER-46084 Don't use setUnion in aggregation for finding inconsistent sharded indexes

(cherry picked from commit 2eaf0ba58cca9e96276e8a07a3d46f0a7b83289d)
Branch: v4.2
https://github.com/mongodb/mongo/commit/decfd07f05e7d9a65cfd07e42c07456f5a72b6f4

Comment by Githook User [ 06/Feb/20 ]

Author:

{'name': 'Cheahuychou Mao', 'username': 'cheahuychou', 'email': 'cheahuychou.mao@mongodb.com'}

Message: SERVER-45389 Add metrics tracking how often shards have inconsistent indexes

create mode 100644 jstests/noPassthrough/sharded_index_consistency_metrics.js
create mode 100644 src/mongo/db/commands/sharded_index_consistency_server_status.cpp
create mode 100644 src/mongo/db/s/periodic_sharded_index_consistency_checker.cpp
create mode 100644 src/mongo/db/s/periodic_sharded_index_consistency_checker.h
Branch: master
https://github.com/mongodb/mongo/commit/ea696eb7a27f18c21223a3ff94d9124f06698af5

Comment by Mihai Andrei [ 09/Jan/20 ]

bruce.lucas I think that would work; since we only really care about the list's length, then it doesn't make sense to store the list as a whole given the limitations that you've pointed out.

Comment by Bruce Lucas (Inactive) [ 09/Jan/20 ]

A list of strings won't be captured by ftdc as it only captures numeric data. You could make it numeric by having a list of subdocuments of the form {collection:1} where collection is the collection name. However normally we don't include per-collection information in serverStatus or ftdc because the number of collections is potentially very large. Also I'm not sure how often this list would change - if it changes frequently that's a problem because schema changes in ftdc reduce compression efficiency.

But if all you're going to do is observe the length of this list would it suffice to simply include the length of the list, i.e. count of inconsistent indexes or of collections with inconsistent indexes?

Comment by Mihai Andrei [ 08/Jan/20 ]

I think the way to do this is to add a list of sharded collections with inconsistent indexes to serverStatus (either as part of the shardedStatistics section or as its own section). Since this information would be captured as part of FTDC, you can count how often this list of collections is non-empty and divide it by how many serverStatus documents were collected to get an idea of how often inconsistent indexes are present on a sharded cluster. If this addition to serverStatus is then backported to 4.2, you could then compare how much of an impact this project has had on reducing the frequency of inconsistent indexes on a sharded cluster.

alycabral and garaudy.etienne, does this seem like a reasonable approach?

Comment by Jack Mulrow [ 07/Jan/20 ]

A possible approach would be a periodic job on the config server that uses the logic from SERVER-44916 (i.e. runs an aggregation with $listIndexes for each sharded collection) to track all sharded collections with currently inconsistent indexes in an array exposed in a serverStatus object. Index creation is not atomic across shards so there might be false positives, but when the job runs later these false positives should be removed.

Generated at Thu Feb 08 05:08:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.