[SERVER-30252] Write oplog operations to kafka Created: 21/Jul/17  Updated: 21/Jul/17  Resolved: 21/Jul/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Jose Luis Pedrosa Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-124 triggers Backlog
duplicates SERVER-5042 Implement support for reliable change... Closed
duplicates SERVER-13932 Change Notification Stream API Closed
Participants:

 Description   

Hi

When insert/update/delete operation is performed, write it not only to the oplog, but also to a kafka endpoint. The shard name and the collection could be used to determine the kafka partition number.

Benefits/Use cases:

  • Easily create sharded clusters by a different shard key
  • Monitoring & auditing of data
  • Apply stream processing on near real time of the data.
  • Provide more consistency in the CAP (see replication).

Despite this could be implemented via external scripts, they become complicated due to the fact that would need to also interact with the config servers (or mongos commands, not sure if all the info is available). To detect new nodes in the cluster, also scaling the script/monitor software can become a difficult task while implemented on the mongod process seems simple.
Questions would be what to do if the Kafka cluster is unavailable, maybe configurable behaviour choice would be good. Despite adding a dependency with an external system as Kafka I think it adds significant benefits.

Potentially it even could be used to replicate the DBs instead of the oplog, and in the event of the failure of the primary, secondaries would be configured to process the pending oplog messages in kafka before actually marking the mongod instance as online.As kafka is replicated and HA,if the operation made it in the primary we can be sure it can make it to the secondaries. At the end, kafka is a write ahead log, same as oplog.

JL



 Comments   
Comment by Ramon Fernandez Marina [ 21/Jul/17 ]

There are previous tickets that would enable this:

Arguably the MongoDB server should not be in charge of feeding other systems, but just enabling other tools to interface with it in a way that permits doing something like what you describe.

I'm therefore going to close this request as a duplicate. I'd encourage you to watch and vote for the tickets above, so your particular use case can be taken into consideration during the design phase.

Regards,
Ramón.

Generated at Thu Feb 08 04:23:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.