[SERVER-22926] Force a find query to target only a specific shard inside a cluster Created: 02/Mar/16  Updated: 04/Mar/16  Resolved: 03/Mar/16

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Tudor Aursulesei Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

I'm using mongodb as a distributed, query-able job queue. When a worker queries the database for a payload, it usually runs a query something like

job = find({"$and":[_myfilter, {"locked_by": null}]}).limit(1).

_myfilter is usually business requirements, and the locked_by part is to assure that no job is taken by two workers at the same time.
After a worker get a potential job, it tries to lock it:

update_one({"_id": job["_id"]}, {"$set":{"locked_by": workerX}}

This works pretty well for our case, but whenever we have a big load, two things happen:

  • the workers compete for the same job, they find it and they try to lock it (only one succeedes) - this can be mitigated by inserting a random field in the document, which is the queried. {"seed" = random(0, 256)}

    and query by {"seed": {"$gt": random(0, 256)}}.

  • the sharded DB queries ALL the available shards (we now have 8); they all respond, but will only use one result 7/8 of the effort is wasted.

The collection is sharded by _id: "hashed"
I'm trying to mitigate this by forcing the first find query to target only one of the shards, chosed by random, and so on. This should reduce my workload, because the next update_one operation is targeted specifically on a shard. I'm seeing three solutions for this:

  • reshard the database by seed, not by hashed _id
  • figure out if i can force only results from a a shard on find
  • create 8 more connections to the shards, and use them randomly on the first find, then use the mongos connection on the update.


 Comments   
Comment by Tudor Aursulesei [ 04/Mar/16 ]

most people nowdays use hashed _id, so the only way i could accomplish that is to get the hashed id to shard mapping and query by
_id:

{"$gt" chunk start, "$lt": chunk finish}

.. highly unlikely.

Comment by Ramon Fernandez Marina [ 04/Mar/16 ]

Please see the documentation for broadcast and targeted operations. If you want a find() query to be targeted it must include the shard key.

Comment by Tudor Aursulesei [ 03/Mar/16 ]

This isn't a question, i'm actually requesting the possibility of limiting a query only on one shard in a sharded cluster with a parameter.

Comment by Ramon Fernandez Marina [ 03/Mar/16 ]

thestick613, please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. See also our Technical Support page for additional support resources.

Regards,
Ramón.

Generated at Thu Feb 08 04:01:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.