[SERVER-67518] Aggregate metric value continually increments when no aggregates are run Created: 24/Jun/22 Updated: 27/Oct/23 Resolved: 11/Aug/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 6.0.0-rc11 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Oliver Bucaojit | Assignee: | Allison Easton |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | shardingemea-qw | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||||
| Sprint: | Sharding EMEA 2022-08-08, Sharding EMEA 2022-08-22 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Story Points: | 3 | ||||||||||||||||||||
| Description |
|
MongoDB v6.0 db.serverStatus().metrics.commands.aggregate value increments about every second on all nodes of the replica set on a system with no aggregations run by user. This behavior is different from version 5.3 and earlier versions. We have tests where we expect the value to be 0 when no queries have been run, which fail now. |
| Comments |
| Comment by Allison Easton [ 11/Aug/22 ] |
|
Perfect, I will close this then. Let me know if you need any further information |
| Comment by Oliver Bucaojit [ 10/Aug/22 ] |
|
Thanks allison.easton@mongodb.com for the details and explanation. Yes the changes are sufficient, we are checking the aggregation values for the replica sets and this fix covers that case. The options for setting the expected behavior on a sharded cluster will be helpful as well. |
| Comment by Allison Easton [ 10/Aug/22 ] |
|
Hi oliver.bucaojit@mongodb.com and chris.kelly@mongodb.com. |
| Comment by Allison Easton [ 28/Jul/22 ] |
|
Hi chris.kelly@mongodb.com , I can give some information on why this is happening and on how to work around it if needed. The behavior has changed recently on master and the 6.0 branch, I have included a description of what behavior to expect where. The aggregation in question was added to the collstats command to return the number of orphaned documents as part of the collstats output. The collstats command is run as a part of gathering ftdc data, which is why the aggregation happens about once a second. It was added in 6.0, which is why this doesn't happen on 5.3. On 6.0.0, this aggregation is run every time collstats is called (both on replica sets and sharded clusters). On master and the current 6.0 branch (but not the released version of 6.0), this aggregation is skipped for replica sets and run much less often for sharded clusters. On replica sets, the aggregation was removed by On sharded clusters, the aggregation was made less common by One option to prevent the aggregation on sharded clusters or on replica sets before BACKPORT-12944 would be to disable FTDC. This way the extra aggregations would only happen if collstats is called directly. Disabling FTDC can be done by running using the setParameter flag on all nodes setting “diagnosticDataCollectionEnabled to 0. Ex: mongod --setParameter “diagnosticDataCollectionEnabled=0” If this option is passed as a startup parameter for the nodes, then the aggregation count should be 0, same as the the behavior before 6.0. If the option is set after starting the node, it will prevent any more aggregations from happening, but there will likely be some that happened before the parameter was set, making the value greater than 0. |
| Comment by Chris Kelly [ 07/Jul/22 ] |
|
At log level 2, this is what's getting captured in the problematic 6.0 copy.
Whereas there is absolutely nothing captured on the working 6.0 copy I shared. I see some tests that pertain to rangeDeletions, orphans, and the FCV value here: It relies on the FCV value being 6.0 to trigger. If you don't have that set to 6.0, it won't happen. If you change it from 6.0 to something else it'll stop incrementing. |
| Comment by Chris Kelly [ 07/Jul/22 ] |
|
I've noticed something strange with this. I observe the reported behavior on both community and enterprise 6.0.0-rc11 on Evergreen, as well as externally on Ubuntu 20.04 in WSL. However, somehow I have managed to create a situation where 6.0 does not increment this metric at first which is reproducible. I used m/mlaunch to initiate a data folder while running 6.0.0-rc11. In one, (data-issues.tar), the metric will increase as described. However, running the other will not see this, even running the same version. To run:
I also tested this on 5.3 and some other 6.0 rc's. It was not present on 5.3, but was present on all 6.0 ones I tested. CURRENTLY TESTING:
SPECULATION: I'm not sure what I managed to do this to make 6.0 not increment this metric with this data folder. The only modification I recall I made was updating the required libraries used for enterprise mongodb using https://www.mongodb.com/docs/v6.0/tutorial/install-mongodb-enterprise-on-ubuntu-tarball/. Then subsequent reinstallation using m (doing an m rm 6.0.0-rc11, then m 6.0.0-rc-11) + another mlaunch init made subsequent versions see the issue. However, if this was the case I would be confused why I can swap between these two data folders and observe different behavior on the same environment. |