Type: New Feature
Priority: Major - P3
Affects Version/s: None
Fix Version/s: None
When insert/update/delete operation is performed, write it not only to the oplog, but also to a kafka endpoint. The shard name and the collection could be used to determine the kafka partition number.
- Easily create sharded clusters by a different shard key
- Monitoring & auditing of data
- Apply stream processing on near real time of the data.
- Provide more consistency in the CAP (see replication).
Despite this could be implemented via external scripts, they become complicated due to the fact that would need to also interact with the config servers (or mongos commands, not sure if all the info is available). To detect new nodes in the cluster, also scaling the script/monitor software can become a difficult task while implemented on the mongod process seems simple.
Questions would be what to do if the Kafka cluster is unavailable, maybe configurable behaviour choice would be good. Despite adding a dependency with an external system as Kafka I think it adds significant benefits.
Potentially it even could be used to replicate the DBs instead of the oplog, and in the event of the failure of the primary, secondaries would be configured to process the pending oplog messages in kafka before actually marking the mongod instance as online.As kafka is replicated and HA,if the operation made it in the primary we can be sure it can make it to the secondaries. At the end, kafka is a write ahead log, same as oplog.