[SERVER-13755] Support for tailable cursors over non capped collections Created: 27/Apr/14  Updated: 14/Aug/14  Resolved: 14/Aug/14

Status: Closed
Project: Core Server
Component/s: Usability
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Jose Luis Pedrosa Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-5042 Implement support for reliable change... Closed
duplicates SERVER-13932 Change Notification Stream API Closed
Related
is related to SERVER-124 triggers Backlog
Participants:

 Description   

There are some use cases in which tailable cursors over a non capped collection may be useful. This is a mitigation of the inability to shard (and therefor sale) capped collections (Tickets already open).

The desired behaviour would be:
A tailable cursor open over a collection would send the data as any change on the collection happens (for inserts, deletes and updates). In other words would be something like make the op-log cluster wide visible. The order should be only strict per shard.

An example use case would be a message queue, avoiding polling the table with constant queries. Another example would be a distributed probe system, were the measurements are sent to a DB and processed by multiple readers.

If there's already a wait to achieve this behaviour in a sharded environment, let me know.

Thanks in advance.



 Comments   
Comment by Jose Luis Pedrosa [ 14/Aug/14 ]

Hi Scott,

"Sharded capped collections" feature, would be a really useful in many cases, if you agree, I'll leave this ticket closed and add a comment in SERVER-2654.

The driver to push capped collections is to be able to have "streams" you can connect to, and receive as they arrive, this will reduce latency a lot. Some guys report sub millisecond latency between the insert and the read, if this feature would be scalable, there would be plenty of use cases.

Indeed the utility you proposed can be really useful, I feel safer if this is implemented in the core of mongo, as it will be failover aware, (maybe restart the cursor), and be available at any driver transparently. So instead discovering the master of each shard/replicaset and join them by application code, mongos should do that, even potentially read/write preferences... That is why I think that mongos would need to do some job to merge the streams of the shards.

Thanks!

Comment by Scott Hernandez (Inactive) [ 14/Aug/14 ]

The real requirements are in the first comment of that request and the implementation would be hidden to the user (unlikely to use capped collections under the hoods), but more like what is described here: SERVER-13932

Unless you specifically want tailable sharded cursors this request is a duplicate of those.

Also, there is a tool which already provides this functionality using the oplog across all shards: https://github.com/10gen-labs/mongo-connector
It monitors the oplog across the whole sharded cluster and makes those events available to clients.

BTW. What you describe is not a "mongos" change since there is no "tailable" option for non-capped collections on a single instance nor capped support in a sharded collection so there is a lot more in the system which need to change for what you are requesting.

Comment by Jose Luis Pedrosa [ 14/Aug/14 ]

Hi Scot

Their sugestion is to put the changes in a capped collection, and then we go back to the scalability problem.

Indeed their feature is interesting, but is complementary to this one. I would like you to reope the ticket if you agree. I understand this is mainly a mongos change.

Jose Luis

Comment by Scott Hernandez (Inactive) [ 14/Aug/14 ]

I believe this request covers your needs and is more general/practical: SERVER-5042

If you really want "tailable cursors for all collection types" we can re-open the issue, but change notifications seems more like what you want.

Comment by Jose Luis Pedrosa [ 14/Aug/14 ]

Hi one comment,

I understand this would give all the operations in correct order of each shading key. Usually in most business the order must be strict per entity. IE: if customer A, sings up and then purchase a product, you need to receive the sing up event and the purchase in strict order, but it's not relevant if in the middle you get another purchase of customer B. So choosing the right shard key for each use case should give a low latency distributed queue.

Generated at Thu Feb 08 03:32:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.