[SERVER-45679] Add serverStatus metric that reveals impact of arrays Created: 21/Jan/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Diagnostics, Querying
Affects Version/s: 4.3.2
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Eric Sedor Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 1
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Query Execution
Participants:
Case:

 Description   
  1. Large arrays within documents have performance costs that are currently indirectly inferred from the behavior of wiredTiger's cache page management.

Since BSON does not encode array size (in elements), this could be a counter like "array elements visited" or (if that is deemed too expensive), it could be a counter that increments by the total byte size of each array (which is included in the BSON) when it is iterated through for any reason.

Combined with opcounters and scannedObjects (docsExamined), either of these would dramatically improve the ability to diagnose array iteration as impacting the system.



 Comments   
Comment by Asya Kamsky [ 20/Jul/21 ]

What about increasing average document size which I think is available? Increasing oplog entry size (over time)? Is the number of keys updated available in diagnostics (I know it's in the logs but not sure there are counters) - large arrays are particularly bad when they are indexed and an update changes position of every element (pulling from the middle, shifting all by deleting from the front, etc)...

Comment by Eric Sedor [ 30/Jan/20 ]

Elaborating on an aspect of an offline conversation with Asya, an open question we should answer is around specific circumstances we should or shouldn't measure. I would think (ideally):

  • Array elements visited during update operations
  • Array elements visited during positional projections
  • Array elements visited during $unwind (for a single operation should be the sum of array elements of documents in the pipeline when $unwind is reached)
  • Array elements added during update operations
  • Array elements removed during update operations

But again, counting each element may be too extensive and perhaps incrementing a counter of bytes by the size of an array to be traversed would reveal an upper bound of possibility. Which in some way may be interesting in and of itself because it would proactively count the worst-case.

bruce.lucas/dmitry.agranat can you add to or comment on this list?

Comment by Bruce Lucas (Inactive) [ 24/Jan/20 ]

I think a metric specifically about arrays would help us detect the anti-pattern of large arrays, and/or the related anti-pattern of indefinitely growing arrays.

Large documents per se are not so much of an anti-pattern, I think, but as they can be involved in performance problems, it might be useful to add a metric related to document size.

Comment by Asya Kamsky [ 24/Jan/20 ]

> Large arrays within documents have performance costs that are currently indirectly inferred from the behavior of wiredTiger's cache page management.

eric.sedor is this different than large documents without arrays? In other words, is this specifically about some array related behavior?

Generated at Thu Feb 08 05:09:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.