[SERVER-2001] option to hash shard key Created: 25/Oct/10  Updated: 23/Nov/17  Resolved: 17/Oct/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 2.3.0

Type: Improvement Priority: Major - P3
Reporter: Eliot Horowitz (Inactive) Assignee: Kevin Matulef
Resolution: Done Votes: 39
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-5858 Special index types including hashed ... Closed
is depended on by SERVER-7358 Pre-split new collections when using ... Closed
Related
related to SERVER-5877 Make hashed indexes work with multi-k... Backlog
related to SERVER-5878 Allow hashed indexes to be unique Backlog
is related to SERVER-511 "hash" index Closed
is related to DOCS-762 Document hashed shard key Closed
is related to SERVER-7674 MoveChunk should allow you to specify... Closed
is related to SERVER-8031 Allow dropping the _id index when you... Closed
Participants:

 Comments   
Comment by auto [ 20/Dec/12 ]

Author:

{u'date': u'2012-12-20T20:59:05Z', u'email': u'matulef@10gen.com', u'name': u'Kevin Matulef'}

Message: SERVER-2001 add startup test to validate hash function
Branch: master
https://github.com/mongodb/mongo/commit/3ec68156685a6119163cefd2b4a3c5184444fe53

Comment by auto [ 16/Oct/12 ]

Author:

{u'date': u'2012-10-16T09:01:41-07:00', u'email': u'matulef@10gen.com', u'name': u'Kevin Matulef'}

Message: SERVER-2001 declare FieldInterval as struct for consistency. avoids windows compile warnings
Branch: master
https://github.com/mongodb/mongo/commit/fac1f128e23e200092a53ea4f64d749145c31656

Comment by auto [ 16/Oct/12 ]

Author:

{u'date': u'2012-10-16T03:08:17-07:00', u'email': u'matulef@gmail.com', u'name': u'Kevin Matulef'}

Message: SERVER-2001 filter queries on mongod using new key extraction path

Queries on mongod must filter out documents that do not currently belong
to the shard. The belongsToMe function now uses the new key extraction
path to make this determination. An optimization is also added to the
clientcursor class so that it can make this determination using a
covered index if available.
Branch: master
https://github.com/mongodb/mongo/commit/7db66f228efe5fa604f94fcb06e850f09920a711

Comment by auto [ 16/Oct/12 ]

Author:

{u'date': u'2012-10-15T21:06:26-07:00', u'email': u'matulef@gmail.com', u'name': u'Kevin Matulef'}

Message: SERVER-2001 revert to previous method for extracting keys in hashed indexes

Hashed index keys are expected to have empty field names. A previous commit
accidentally changed this behavior.
Branch: master
https://github.com/mongodb/mongo/commit/58925e7afbf6f02bf604f80519922d1728ce152d

Comment by auto [ 16/Oct/12 ]

Author:

{u'date': u'2012-10-16T00:26:20-07:00', u'name': u'Kevin Matulef', u'email': u'matulef@gmail.com'}

Message: SERVER-2001 fix ownership determination of ops that occur during migrate

For ops that occur during a migration, mongod uses the isInRange
function to determine whether the ops apply to the chunk being
migrated. It now makes this determination using the extractKey
method which handles more general shard key patterns.
Branch: master
https://github.com/mongodb/mongo/commit/5992d31f7011b31517b62c2b8bd52141b56a77f3

Comment by auto [ 15/Oct/12 ]

Author:

{u'date': u'2012-10-15T13:12:18-07:00', u'name': u'Kevin Matulef', u'email': u'matulef@10gen.com'}

Message: SERVER-2001 calculate query bounds using more general key expressions

In sharding, given a key expression like

{a : 1}

or

{a : -1}

we
must translate a query to a set of bounds to figure out which shards
are relevant. This patch amends the keyBounds calculation function
so that patterns which start with "hashed" fields calculate the
right bounds.
Branch: master
https://github.com/mongodb/mongo/commit/5a28905a8359f336aaa9c6acffe96448ab88e2a1

Comment by auto [ 15/Oct/12 ]

Author:

{u'date': u'2012-10-14T23:25:12-07:00', u'name': u'Kevin Matulef', u'email': u'matulef@gmail.com'}

Message: SERVER-2001 KeyPattern class; utilities for more general index & shard key specs

The KeyPattern class is an abstraction for defining more general
expression-based keys (both index keys and shard keys). This class
provide some utility functions for extracting keys based on an
expression, and computing range bounds based on an expression.
This patch lays the groundwork and begins to make use of KeyPatterns.
The idea is that to implement more general key expressions, we will only
need to enhance the functions in this class.
Branch: master
https://github.com/mongodb/mongo/commit/16a8e0484cfbe69c674ed6f3b7609fdc1518b50e

Comment by auto [ 15/Oct/12 ]

Author:

{u'date': u'2012-10-14T20:51:00-07:00', u'name': u'Kevin Matulef', u'email': u'matulef@gmail.com'}

Message: SERVER-2001 define const to use as default hash seed
Branch: master
https://github.com/mongodb/mongo/commit/02e0dc4015276b787dcfc448304d978a06bec51d

Comment by auto [ 10/Oct/12 ]

Author:

{u'date': u'2012-10-10T15:10:13-07:00', u'email': u'matulef@10gen.com', u'name': u'Kevin Matulef'}

Message: SERVER-2001 change shard key validation to allow hashed shard keys

This changes the top-level shard key validation to allow shard keys
such as

{a : "hashed"}

. It also adds some helper functions for
determining when a unique index is compatible with a given shard
key, and a variety of unit tests and a js test.
Branch: master
https://github.com/mongodb/mongo/commit/532a051330efa57bcf2f27ab597f0ab3130715a1

Comment by auto [ 10/Oct/12 ]

Author:

{u'date': u'2012-10-10T14:59:42-07:00', u'email': u'matulef@10gen.com', u'name': u'Kevin Matulef'}

Message: SERVER-2001 make modifiedRangeBound accept index patterns without 1/-1 vals

To allow more complex index types, modifiedRangeBound needs to make sense
of index patterns like

{a : "hashed"}

that don't have 1 or -1 for the
field values. With this change, it will treat bounds over a hashed
keyspace as ascending over the range of hashed values.
Branch: master
https://github.com/mongodb/mongo/commit/a50b5956fc17d1f6fdb49c1b2c718bd39ef1aaf8

Comment by auto [ 10/Oct/12 ]

Author:

{u'date': u'2012-10-10T12:50:46-07:00', u'email': u'matulef@10gen.com', u'name': u'Kevin Matulef'}

Message: SERVER-2001 shardChunkManager no longer modifies shard key with ascending fields

Validation of proper shard keys is now done at the top-level, so there
is no need to replace field values with "1.0". Moreover, hashed keys
like

{ a : "hashed" }

will not have numeric field vals.
Branch: master
https://github.com/mongodb/mongo/commit/062b91bea8992f83c12278e742c22a701aabbd80

Comment by auto [ 04/Oct/12 ]

Author:

{u'date': u'2012-10-04T14:40:47-07:00', u'email': u'matulef@gmail.com', u'name': u'Kevin Matulef'}

Message: SERVER-2001 don't trigger 'split-at-top' heuristic with special shard keys
Branch: master
https://github.com/mongodb/mongo/commit/0d57a0eb93dc750e15e5c459d15aeb8f14296471

Comment by Remon van Vliet [ 04/Oct/12 ]

@Kevin Okay, makes sense. And yes both are debatable (the second issue is flat out broken really ) but SERVER-2001 isn't the place to discuss that. I look forward to this new index type, thanks!

Comment by Kevin Matulef [ 03/Oct/12 ]

remonvv 3 and 3.0 currently compare as equal in our query language (try db.foo.insert(

{a:NumberLong(3)}

) and db.foo.find(

{a:3.0}

)) so we need to respect that (one could argue about whether it should be this way, but that's the way it is). Floating point issues were discussed extensively internally. Indeed, there are some tricky issues there. Floating point values that are too large to be represented as 64-bit ints will not be supported.

{a:1, b:2}

and

{b:2, a:1}

currently hash to different things. This is also consistent with our query language (try db.foo.insert({t : {a:1, b:2}}) and db.foo.find({t :

{b:2, a:1}

})). Again, debatable, but we're aiming for consistency.

Thanks for the pointer to the incremental version of MurmurHash2! Will check it out.

Comment by Remon van Vliet [ 03/Oct/12 ]

@Kevin. The most common incremental implementation is A2 : http://code.google.com/p/pyfasthash/source/browse/trunk/src/MurmurHash/MurmurHash2A.cpp. Whether or not it's relevant depends on how many cycles are burned on hashing in your implementation.

I do not really understand why 3 and 3.0 should evaluate to the same hash. Wouldn't it be better to simply make sure the specification of BSON documents where that is relevant includes type? Apart from the binary incompatibility there are other issues. A lot of (higher) integer numbers cannot accurately be represented by IEEE 745 floating point values so such equality comparisons would have to include some sort of arbitrary epsilon range check to get properly squashed.

On a related note, will

{a:1, b:2}

and

{b:2, a:1}

evaluate to the same hash as well? I suppose that issue is a bit broader than just SERVER-2001 but where I don't think 3 and 3.0 are actually value equal I do think those two sample document are

Comment by Kevin Matulef [ 02/Oct/12 ]

The reason for the incremental computation is that there are certain values, such as the integer 3 and the floating point 3.0, that need to compare as equal, even though the binary that describes them is not the same. These values can exist in sub-documents or arrays. Thus, to compute the hash of an arbitrary BSON element, we need to recursively apply the hash function to sub-documents, squashing certain types as we go (the non-incremental way to compute this would involve rewriting the whole document with the types squashed, and then applying the hash, but that's not optimal).

If you want to give a pointer to a good, incremental implementation of Murmur, I'd be happy to take a look.

Comment by Remon van Vliet [ 02/Oct/12 ]

@Kevin. Sound reasoning, thanks for the explanation Can you elaborate on why the hashing algorithm has to be incremental (there are Murmur variants that are, by the way)? I suppose the question is if a switch would be a significant enough performance improvements (MD5 is about 3-4 times slower on average, at least in Java).

Comment by Kevin Matulef [ 02/Oct/12 ]

remonvv, yes, in the case of a collision, it will scan through the docs to find the exact match. With 64-bit hashes though, the number of collisions should be extremely small.

One could indeed use an index that stores the hash as well as the original value, but that has the drawback that the index is larger (it also adds more complexity to the code). We're working on a more general index language, so that you could eventually make the hashed index also a covered index. But for now I think just storing the hash is a reasonable default.

The hashed index type is primarily for sharding, but could also be used to index longer values, like long strings, with a smaller index (if you don't care about range queries). Potentially with a hashed index type we could also optimize the data structure under the hood to improve performance for random inserts.

Murmurhash3 was (and is being) considered, but one issue is that we need to be able to compute the hash incrementally. Last I checked, the main implementation of Murmur was not written that way. It should be possible to implement, but we need any hash library we use to be tested and bulletproof.

Comment by Remon van Vliet [ 01/Oct/12 ]

@Kevin, I understand, so it is assuming that the cardinality of the hashed field is so high that once you've done a lookup based on the hash value the amount of candidate documents matching that hash value is usually 1 or close to it? In other words, say I have documents A and B with a field F. If A.F and B.F are not equal but the hash value of A.F and B.F is equal will it then scan through A and B to find the exact match?

I was assuming the implementation would do something like (under the hood) decorate the index with the hashed field (e.g.

{username:1}

-> {_h:1, username:1}) and the documents with a _h value and then proceed as usual. This, along with pre-splitting the chunks into ordered ranges of the 64-bit long hash, is what I currently do in our mapping library for our products at least and it works pretty well and doesn't involve new special index types. There are (significant) downsides to this of course; data and index size increases being the big one so this approach sounds better from that perspective.

Are there any usecases for this index type other than sharding and perhaps index size reduction? I'm assuming the typical query speed advantage between index

{username:1}

and

{username:"hashed"}

is not that significant and only really starts to take effect on larger field values.

Also, on a related note please consider using a higher throughput hash than MD5 such as Murmur. This would avoid needlessly burning cpu cycles on a heavyweight hash

Thanks for the info!

Comment by Kevin Matulef [ 28/Sep/12 ]

remonvv, the feature will not involve storing an extra "_hash" field in the documents, but it will involve indexing documents by the hash of a given field. In order to shard on the hashed value of the "username" field, for instance, you will first define a hashed index on the username field (e.g. db.coll.ensureIndex(

{"username" : "hashed"}

) which supports equality queries on username. You can then declare

{ "username" : "hashed" }

as your shard key. Equality queries for a particular username will be routed to the appropriate shard, while queries for a range of usernames will be scatter-gather.

Under the hood, the hash values will be represented as 64-bit ints, and the range of possible 64-bit ints will be pre-split into chunks. However the user shouldn't need to worry about the details of the implementation, or the actual hash values themselves.

Comment by Remon van Vliet [ 28/Sep/12 ]

@Kevin, I see you've done some early work on this. Can you give us an update on the intended implementation? The commits I'm seeing for this issue seem to focus on implementing a special index type that supports hashing. I'm wondering how this ties in with SERVER-2001. This seems to deviate somewhat of the expected approach of decorating indexes and documents with a hash field and value respectively.

Comment by auto [ 18/May/12 ]

Author:

{u'login': u'matulef', u'name': u'Kevin Matulef', u'email': u'matulef@10gen.com'}

Message: SERVER-2001 part 2: hashed index type
Branch: master
https://github.com/mongodb/mongo/commit/4a7d8710bac850052aa61b300730acd55217ba32

Comment by auto [ 09/May/12 ]

Author:

{u'login': u'matulef', u'name': u'Kevin Matulef', u'email': u'matulef@gmail.com'}

Message: SERVER-2001 part 1: hashing BSONElements
Branch: master
https://github.com/mongodb/mongo/commit/ea82f5b5be6f73634c73ce031434ae32c4c66c59

Comment by P Eger [ 13/Apr/12 ]

Is this definitely going to make 2.1.X?

Comment by Andrew Armstrong [ 15/Mar/12 ]

An interesting (off topic) feature from picking a shard key (such as Customer Id) would be the ability for MongoDB to later know that specific customer data is always on the same shard.

So in theory later on, even in sharded clusters, MongoDB could do projections (even join-like behaviour) when querying customer-specific data, such as joining the Customer (with a specific Id) to their Purchase records etc.

Comment by Kevin Matulef [ 15/Mar/12 ]

@Keaton, indeed, that's the idea, to allow users to specify a particular field, and have mongo shard automatically based on a hash of that field. It could be the _id field, which is unique to every document, and thus would distribute all the documents essentially randomly, or it could be another field like "username," which would spread out the users, but keep documents corresponding to the same user together. Depending on which field you hash, this could reduce or even eliminate the need for rebalancing (at a cost of losing some data locality). In your case, I think this is exactly what you want.

This feature is in the works, just taking awhile since it touches a lot of the code.

Comment by Keaton Adams [ 14/Mar/12 ]

I have strong interest in using MongoDB for a highly active production data store, but am concerned about the rebalancing work of the current sharding process. What I use today is a hash distribution on a key value to auto load-balance the data upon the initial write, instead of having to regularly re-balance the cluster as data is loaded.

In this system data is copied in 24x7 in batches of 25K records at a time, with 50-60 concurrent writers. Total rows loaded is in the 200 to 250 million per day range, while users are running queries on the dataset. The current product is MVCC so writers don't block readers, and the table data is partitioned by a character string made up of a server name, timestamp and other identifiers to provide a high level of uniqueness with a very even distribution across the nodes in the cluster. The data is written to append-only compressed table partitions and is never updated after the initial write. To drop an entire "day" (200-250 million records), I simply drop the partition from the parent table, a maintenance operation that takes just a few seconds to complete.

The product I currently use supports both hash and random distribution, but the hash key is more useful because users can provide the key value on a search to very quickly pull back a targeted row.

Hash Distribution - With hash distribution, one or more table columns is used as the distribution key for the table. The distribution key is used by a hashing algorithm to assign each row to a particular segment. Keys of the same value will always hash to the same segment. Choosing a unique distribution key, such as a primary key, will ensure the most even data distribution. Hash distribution is the default distribution policy for a table. If a DISTRIBUTED clause is not supplied, then either the PRIMARY KEY (if the table has one) or the first column of the table will be used as the table distribution key.

Thanks.

Comment by Remon van Vliet [ 04/Nov/11 ]

Having uniquess be a mandatory property of hash values for the purpose mentioned here seems very limiting to me. Hashing in this way is only really useful for scenarios where people have pre-split/moved their data across a fixed amount of available shards. As such the hash should only have requirements that help facilitate determining the appropriate shard. Uniqueness is not mandatory or even helpful for this (other than the fact that it can then be used as _id value). Generating a stable, cluster unique hash based on some other piece of data that may not be unique across that same dataset is challenging at best anyway.

The suggestion to make the feature more restrictive by limiting it to a fixed, reserved field name and type (32-bit signed integer seems perfectly fine to me, you need a predefined lower and upper bound for pre-splitting). For example, this is what we do :

{
_id: ObjectId(...),
_hash: 9872139812,
...
}

Where the _hash is generated by a hashing algorithm with one of the document fields as it's input. I actually have a document with syntax and implementation suggestions if someone's interested. With this setup "_hash" is always your shard key so it simplifies things quite a bit on the shard administration end of things too.

By the way, "<1 day" seems like an overly optimistic estimate if you ask me

Comment by Dwight Merriman [ 02/Oct/11 ]

if the hash is not guaranteed unique (probably), it may make sense to use just a 32 bit value field to store the hash value.

i suppose another approach is make it an md5 and put it in _id and shard on _id. that may be most efficient as an _id index will be around regardless. but only works if shard key is unique.

Comment by Grégoire Seux [ 22/Jun/11 ]

Seems like a good idea : we implemented hashed shard key on client-side : operation rate sky rocked ( x3 and less variability). Balancing is moreover quicker and done during our very heavy insertion process : perfect !

Comment by Remon van Vliet [ 04/May/11 ]

Great, works for me. I'm assuming version 1 will guarantee near perfect distribution for shard keys that are or partially composed of type ObjectId? That seems the most common usecase.

Comment by Eliot Horowitz (Inactive) [ 01/Apr/11 ]

All information would be posted here.

I don't think version 1 will have different hashing choices, just a good general one that produces very distributed values.

Comment by Remon van Vliet [ 01/Apr/11 ]

Is there a place where I can find information regarding the planned implementation/discussion of this feature? The lack of hashing based read/write distribution amongst available shards is a huge issue for us now. We're actually considering implementing an app-side layer to do this but that obviously has a number of serious drawbacks.

Also, will we be able to select various hashing strategies? Apart from the ones already mentioned (or at least linear hashing) I think libketama's continuum strategy might be a good fit as well in certain situations (http://amarok.kde.org/blog/archives/363-libketama-a-consistent-hashing-algo-for-memcache-clients.html)

Comment by Andrew Armstrong [ 07/Mar/11 ]

I think its worth investigating an option to use linear hash (http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-hash.html) or other consistent hashing (http://wiki.membase.org/display/membase/vBuckets) so that adding/removing nodes in the cluster does not necessarily impact how much rebalancing needs to take place.

Generated at Thu Feb 08 02:58:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.