Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: features we're not sure of
Affects Version/s: None
Component/s: Sharding
Labels:
- tommaso-triage

Assigned Teams:

Sharding
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Consider the following:

conversation
{
users: [id1, id1, id3, ...],
...other conversation data...
}

Displaying a list of a single user's conversations with no further restrictions is impossible without querying every shard. Even restructuring the collection doesn't fix the problem.

This can be worked around at the application level by creating and maintaining a separate collection with one entry per user, but that isn't very elegant. If the user needs to be able to arbitrarily filter their list of conversations, this gets worse, because most or all of the data needs to be available at query time, and therefore a large amount of data needs to be duplicated per user.

When duplicating at the code level, it is necessary to create one duplicate per entry, regardless of whether they actually get placed on different shards, because they MAY get placed on different shards, or may get rebalanced to another shard later. Sharding based on array contents would still require duplication sometimes, but it could be greatly reduced, and may not require any duplication at all if all entries in the array resolve to the same shard.

The logical implementation for this is actually fairly straightforward:

Insert:
1. look at the elements in the array and determine which shards are within range of any of the elements
2. Insert the record on each shard

Update:
1. look up the complete sharded array from any copy of the record using the provided shard key
2. If the sharded array is being modified, determine whether the list of shards it resides on will change, and remove from or insert to those shards as needed.
3. Update the record on all shards

Delete:
1. look up the complete sharded array from any copy of the record using the provided shard key
2. remove from all shards it resides on

I don't know for sure whether or not this would complicate re-balancing, but I don't think so. Unless I've missed something you SHOULD be able to treat each value as effectively distinct for this. When you split the chunk, just split the records as needed. The catch here is that the actual gains may be somewhat unpredictable, especially when the split was inspired by high disk space use. In any case though, it couldn't be any worse than having to duplicate everything all the time, even when it isn't needed.

is related to

SERVER-3586 Array as shard key value should be prohibited.

Closed

Assignee:: [DO NOT USE] Backlog - Sharding Team
Reporter:: John Crenshaw
Participants:: [DO NOT USE] Backlog - Sharding Team, Eliot Horowitz, John Crenshaw, Julian Coleman, Mathias Stearn
Votes:: 2 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Nov 05 2010 09:01:24 AM UTC
Updated:: Dec 06 2022 05:47:06 AM UTC
Resolved:: Dec 02 2021 01:36:11 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates