[SERVER-31565] Add a `serverStatus` section with runtime information for the sessions cache Created: 13/Oct/17 Updated: 30/Oct/23 Resolved: 08/Nov/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.0-rc4 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Mira Carey |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Platforms 2017-10-23, Platforms 2017-11-13 | ||||||||
| Participants: | |||||||||
| Description |
|
With the way that sessions management works, a periodic maintenance task runs on each node in a replica set or sharded cluster, which synchronizes the in-memory sessions state with what's persisted in the config.system.sessions collection. In order to improve the product supportability we should include FTDC metrics for sessions management as part of the serverStatus output. This will allow interesting server behaviour changes to be correlated with executions of the sessions state maintenance task. I propose that the following metrics be reported, under a section called sessions. All these metrics are individual for the node:
NOTE: These statistics are particularly useful for the sharding case, where there is cross-node communication and potentially large number of sessions that could be refreshed each round, so it is acceptable that they are only present in a sharded cluster. |
| Comments |
| Comment by Githook User [ 08/Nov/17 ] |
|
Author: {'name': 'Jason Carey', 'username': 'hanumantmk', 'email': 'jcarey@argv.me'}Message: |
| Comment by Ian Whalen (Inactive) [ 03/Nov/17 ] |
|
Revert to fix failure in transaction_reaper.js. |
| Comment by Githook User [ 03/Nov/17 ] |
|
Author: {'name': 'Ian Whalen', 'username': 'IanWhalen', 'email': 'ian.whalen@gmail.com'}Message: Revert " This reverts commit 7cd8508b06e1574bea211dff054855b70b7cc20e. |
| Comment by Githook User [ 01/Nov/17 ] |
|
Author: {'name': 'samantharitter', 'username': 'samantharitter', 'email': 'samantha.ritter@10gen.com'}Message: |
| Comment by Githook User [ 01/Nov/17 ] |
|
Author: {'name': 'samantharitter', 'username': 'samantharitter', 'email': 'samantha.ritter@10gen.com'}Message: |
| Comment by Samantha Ritter (Inactive) [ 16/Oct/17 ] |
|
It's a great idea to add some sessions-related statistics to server status, but I have a few comments/suggestions. I am assuming that "duration" means we time the background jobs from start to finish, and report how long they take to run. I am assuming that "sessions collection cleanup" means the part of the refresh background job where we remove records that have been ended via endSessions from config.system.sessions. This cleanup happens during the regular background job, which also does refreshing, among other things, so it doesn't make sense to report its timing separately from the refresh timing. The cleanup of records that expire naturally, without an explicit user call to endSessions, happens via a TTL index, so we can't report stats on that from within the session cache. In my proposed metrics below, these stats are aggregated into one "sessionsCollectionBackgroundJob" group. There is a separate cleanup task that we should also report stats on: the transaction reaper. This second background job is responsible for clearing records out of the transaction table if their parent sessions have ended or expired, and it runs on a schedule independent of the refresh background job. However, it runs with the same frequency as the refresh background job, which is once every logicalSessionRefreshMinutes, or every 5 minutes by default. Given those things, I propose the following set of metrics, which adds to Kal's original set but uses names that I think are more clear:
We've already added a section to serverStatus called "logicalSessionRecordCache," which currently only reports the number of active records in the cache. I'd like to add the new metrics to that existing section. |
| Comment by Bruce Lucas (Inactive) [ 13/Oct/17 ] |
|
OK. In the interest of specificity, when are the counts and other metrics updated - all at the same time, at the end of the the corresponding activity (refresh, cleanup)? |
| Comment by Kaloian Manassiev [ 13/Oct/17 ] |
|
bruce.lucas: No, the frequency is much lower than that - on the order of every 5 minutes and should span over many FTDC rounds. So being able to infer a rate of cleanup is not really meaningful. |
| Comment by Bruce Lucas (Inactive) [ 13/Oct/17 ] |
|
How often does the cleanup happen? If it's very frequent (say once a second or more) then it might be better to make to make the numbers cumulative so they can be differentiated to produce a rate, i.e. entries cleaned up per second, entries refreshed per second, etc. If infrequent the format you suggest looks ok. |