[DOCS-13583] Investigate changes in SERVER-45624: Pre-split and distribute chunks of sessions collection Created: 10/Apr/20  Updated: 13/Nov/23  Due: 15/May/20  Resolved: 18/May/20

Status: Closed
Project: Documentation
Component/s: manual
Affects Version/s: None
Fix Version/s: 4.4.0-rc1, 4.2.7, 4.7.0, 3.6.21, 4.0.22, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Kay Kim (Inactive)
Resolution: Fixed Votes: 0
Labels: docs-sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-45624 Pre-split and distribute chunks of se... Closed
Participants:
Days since reply: 3 years, 37 weeks, 1 day ago
Epic Link: DOCS: 4.4 Server Release Work

 Description   

Description

Downstream Change Summary

This ticket introduces a mongod startup+runtime server parameter called "minNumChunksForSessionsCollection" (default: 1024, min: 1, max: 1,000,000). On a config server's primary, this parameter corresponds to the the target minimum number of config.system.sessions chunks for the balancer. Based on the comments in SERVER-45624, Atlas users do not have the privileges to do any administrative changes to the sessions collection so it is up to the DOCS team to decide whether this should be documented.

Description of Linked Ticket

Sessions records are small in size, so usually the sessions collection doesn't get automatically splitted by the auto-splitter and thus it is not balanced automatically. This can cause an high load on the primary shard of the sessions collection.

Some context:
The session collection is sharded by default on the `_id` field and a session record looks like this one:

{
    "_id" : {
        "id" : UUID("48195d0c-aeb6-4271-8bda-1ff004ed3fda"),
        "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
    },
    "lastUse" : ISODate("2020-01-16T13:46:28.049Z")
}

As you can see the `_id` field is an object itself and the internal `id` is of type UUID. In particular we use UUIDv4 that is a 128-bit randomly generated number. Since some of the internal bits are used to store the version and the variant, it is not possible to use normal integer arithmetics on this type. Fortunately we can still compare two UUID and sort them, this is necessary to use them as split points.

I ran a syntethic benchmark to see how sessions get distributed on a partitioned collection. I partitioned the collection in 10 equally sized chunks and assigned each one to a different shard. I tried first to simulate 1000 sessions and then 100.000.


As you can see from the graphs the truly random distribution of the generated UUID made possible to distribute them fairly among the shards.

I've also created a js script that if executed on a mongos will partition the session collection and it will distribute one chunk to every shard.

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Githook User [ 26/May/20 ]

Author:

{'name': 'Kay Kim', 'email': 'kay.kim@10gen.com', 'username': 'kay-kim'}

Message: DOCS-13583: 4.2.7 system.sessions autosplit
Branch: v4.2
https://github.com/mongodb/docs/commit/850d2f3f1daed8195fba4582ea5884e26106c138

Comment by Githook User [ 18/May/20 ]

Author:

{'name': 'Kay Kim', 'email': 'kay.kim@10gen.com', 'username': 'kay-kim'}

Message: DOCS-13583: 4.2.7 system.sessions autosplit
Branch: v4.2.7
https://github.com/mongodb/docs/commit/68c7e9dfa13b39d7cd5400769fa18140c46e4105

Comment by Githook User [ 18/May/20 ]

Author:

{'name': 'Kay Kim', 'email': 'kay.kim@10gen.com', 'username': 'kay-kim'}

Message: DOCS-13583: system.sessions autosplit
Branch: master
https://github.com/mongodb/docs/commit/50b79f02874fe80a031acba155a7afc646b428b8

Generated at Thu Feb 08 08:08:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.