[SERVER-72802] Reduce collStats logging for unexpected fields Created: 12/Jan/23  Updated: 27/Feb/23  Resolved: 22/Feb/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Pol Pinol
Resolution: Won't Do Votes: 1
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Sharding EMEA
Operating System: ALL
Sprint: Sharding EMEA 2023-01-23, Sharding EMEA 2023-02-06, Sharding EMEA 2023-02-20, Sharding EMEA 2023-03-06
Participants:

 Description   

Currently the mongos when executes collStats, parse and aggregate the replies from all the shards and emits a log line every time it encounters a field not mentioned in the aggregation loop.
Emitting a log line is not useful at all and cause more harm than good:

  • It doesn't help developers remembering to update the mongos aggregation loop when they add a new metric to collStats. If all tests pass then developers will simply commit their new code.
  • For customer with production cluster is completely useless to know that we forgot to update the metrics aggregation loop on the mongos.

Either we decide to make this check more strict (invariant/tassert) and force developers to always update the metrics aggregation loop on the mongos or we keep the current semantics but we log only at debug level so that it won't be visible in production.

Also if we decide to keep the log, we should emit a single log line with a list of missing fields and not one line per fields.

Currently (r6.3.0-alpha-1041-g541ba52e793 this is the list of missing fields:

numOrphanDocs
freeStorageSize
inMemory
indexBuilds
scaleFactor
$clusterTime
$configTime
$topologyTime
operationTime

 



 Comments   
Comment by Pierlauro Sciarelli [ 21/Feb/23 ]

Agreed with pol.pinol@mongodb.com to repurpose this ticket with the objective of reviewing all fields not currently considered by the router command (returned by shards but not by routers) and decide what to do with them:

  • EITHER create an cumulative output (for example, currently we return count that is the sum of the count returned from each shard)
  • OR return them shard by shard (e.g. it makes no sense to return a cumulative output of numOrphanDocs, better returning something like numOrpanDocs: {shard0: 30, shard1: 100} )

Then create a regression test that will take care of:

  • Call collStats for a collection on the router
  • Call collStats for the same collection directly on each shard
  • Make sure that the router returned all fields
Generated at Thu Feb 08 06:22:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.