[SERVER-45624] Pre-split and distribute chunks of sessions collection Created: 17/Jan/20  Updated: 08/Jan/24  Resolved: 09/Apr/20

Status: Closed
Project: Core Server
Component/s: Performance, Sharding
Affects Version/s: None
Fix Version/s: 4.2.7, 4.4.0-rc1, 4.7.0, 3.6.21, 4.0.22

Type: Improvement Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Cheahuychou Mao
Resolution: Fixed Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: JPEG File Sessions distribution on shards_100k_sessions.jpeg     JPEG File Sessions distribution on shards_1k_sessions.jpeg    
Issue Links:
Backports
Depends
Documented
is documented by DOCS-13583 Investigate changes in SERVER-45624: ... Closed
Related
related to SERVER-66078 Adapt sessions collection balacing po... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4, v4.2, v4.0, v3.6
Sprint: Sharding 2020-03-09, Sharding 2020-03-23, Sharding 2020-04-06, Sharding 2020-04-20
Participants:
Case:

 Description   

Sessions records are small in size, so usually the sessions collection doesn't get automatically splitted by the auto-splitter and thus it is not balanced automatically. This can cause an high load on the primary shard of the sessions collection.

Some context:
The session collection is sharded by default on the `_id` field and a session record looks like this one:

{
    "_id" : {
        "id" : UUID("48195d0c-aeb6-4271-8bda-1ff004ed3fda"),
        "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
    },
    "lastUse" : ISODate("2020-01-16T13:46:28.049Z")
}

As you can see the `_id` field is an object itself and the internal `id` is of type UUID. In particular we use UUIDv4 that is a 128-bit randomly generated number. Since some of the internal bits are used to store the version and the variant, it is not possible to use normal integer arithmetics on this type. Fortunately we can still compare two UUID and sort them, this is necessary to use them as split points.

I ran a syntethic benchmark to see how sessions get distributed on a partitioned collection. I partitioned the collection in 10 equally sized chunks and assigned each one to a different shard. I tried first to simulate 1000 sessions and then 100.000.


As you can see from the graphs the truly random distribution of the generated UUID made possible to distribute them fairly among the shards.

I've also created a js script that if executed on a mongos will partition the session collection and it will distribute one chunk to every shard.



 Comments   
Comment by Githook User [ 28/Oct/20 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-45624 Make the balancer split the sessions collection

(cherry picked from commit a8f80d013ee948e04671b1814d9f3989f6ea8314)
Branch: v4.0
https://github.com/mongodb/mongo/commit/cd24a7fdb304de9b8bd16a15fb24acddd614d545

Comment by Githook User [ 28/Oct/20 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'mao.cheahuychou@gmail.com', 'username': 'cheahuychou'}

Message: SERVER-45624 Make the balancer split the sessions collection

(cherry picked from commit a8f80d013ee948e04671b1814d9f3989f6ea8314)
Branch: v3.6
https://github.com/mongodb/mongo/commit/c6635ba1c55a508eeb35e82277c732d4caa9cea9

Comment by Githook User [ 16/Apr/20 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-45624 Make the balancer split the sessions collection

(cherry picked from commit a8f80d013ee948e04671b1814d9f3989f6ea8314)
Branch: v4.2
https://github.com/mongodb/mongo/commit/15d84c10b80fed9649cd1791594d26d8723de7e0

Comment by Githook User [ 09/Apr/20 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-45624 Make the balancer split the sessions collection

(cherry picked from commit a8f80d013ee948e04671b1814d9f3989f6ea8314)
Branch: v4.4
https://github.com/mongodb/mongo/commit/6ff5746bd4f8af784b3b9365635dd477837c3b1a

Comment by Githook User [ 09/Apr/20 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-45624 Make the balancer split the sessions collection
Branch: master
https://github.com/mongodb/mongo/commit/a8f80d013ee948e04671b1814d9f3989f6ea8314

Comment by Kaloian Manassiev [ 27/Feb/20 ]

Yes, this was the idea (it's in the linked script). This ticket is about implementing it as part of the config server's shardCollection task.

Comment by Esha Maharishi (Inactive) [ 27/Feb/20 ]

kaloian.manassiev, it seems to me that Tommaso's results show that UUID's are roughly uniformly distributed across their range of possible values.

Since session UUID's are automatically generated, not picked specifically by a user, I thought it may be reasonable to pre-split the sessions collection using an algorithm similar to what we do for hashed sharding. (I am not familiar with what that algorithm is and whether it can directly apply, but conceptually it seems similar.)

What do you think of this possibility?

Comment by Kaloian Manassiev [ 26/Feb/20 ]

The sessions collection doesn't use hashed sharding and because of this it can not be automatically pre-split/pre-balanced, regardless of whether it's empty or not. I believe we can't make it use hashed sharding because the _id is an object and because it needs to be able to efficiently search only given the session id only.

Comment by Esha Maharishi (Inactive) [ 26/Feb/20 ]

This ticket is to explore why the sessions collection does not get pre-split (that is, which InitialSplitPolicy does it follow today), and to add support for pre-splitting it if possible.

I think the sessions collection does not get pre-split if the collection already exists, e.g. because the sharded cluster was created from an existing replica set. However, I am curious whether it also does not get pre-split in a fresh sharded cluster.

Generated at Thu Feb 08 05:09:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.