[SERVER-84902] POC $merge/$out running on secondaries Created: 22/Jan/20 Updated: 12/Jan/24 Resolved: 03/Feb/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | David Storch |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Sprint: | Query 2020-02-10 |
| Participants: |
| Comments |
| Comment by David Storch [ 03/Feb/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
We've decided to move forward with this project, which is now tracked under PM-1770. Further planning for this project will follow the normal scope and design process. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tess Avitabile (Inactive) [ 28/Jan/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
I'm not sure if they do. And I'm not totally sure whether I think it would be better. On the one hand, it's good to have a generic way to target the primary. On the other hand, it's duplicating work already done by the ReplicationCoordinator. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 28/Jan/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks tess.avitabile. Yeah, targeting ourself should be fine, but will have to be tested of course. It's definitely possible that at the time-of-check we are not primary, but by the time we are actually running the inserts we are primary. But this shouldn't be a problem.
Do replica set nodes which are not shardsvrs have a replica set monitor? If so, where do I obtain it from? | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tess Avitabile (Inactive) [ 28/Jan/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Those changes to the ReplicationCoordinator look good, except that you will need to lock _mutex in order to access _topCoord and _rsConfig. Another option is to make TopologyCoordinator::_currentPrimaryMember() public and call it from ReplicationCoordinator. A risk to this approach is that we may be primary by the time we call getCurrentPrimaryHostAndPort(), so we may end up targeting ourselves. Does this work? If it does work, is there any benefit to checking if we're primary in MongoInterfaceStandalone::insert()? I also wanted to ask why we can't use the replica set monitor here. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 28/Jan/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
https://mongodbcr.appspot.com/561800005/ captures my work so far on this topic. This patch seems to work for running $merge with whenMatched:"fail", and whenNotMatched:"insert" against a secondary of a single replica set (unsharded) configuration. The work is not straightforward because the existing infrastructure for $merge/$out targeting remote nodes is pretty tightly coupled with sharding. I have a few remaining questions about how to implement this which probably require input from folks from distributed systems. tess.avitabile, the linked patch needs to find the HostAndPort for the current primary node. See the changes to replication_coordinator.h and the coupled changes to ReplicationCoordinatorImpl. Does this seem like a reasonable change from your point of view? esha.maharishi there are a few sharding-related questions remaining:
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 24/Jan/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bad news on question #2 above! The ClusterWriter appears to have sharding-specific pieces, and cannot be used out of the box:
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 24/Jan/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
Notes from looking into this: $merge on secondaries does appear to work already for sharded clusters. $merge on secondaries for a replica set fails with an error like this:
This happens because replica set nodes that are not shard servers are initialized with MongoInterfaceStandalone rather than MongoInterfaceShardSvr. The methods that $merge calls into in the "standalone" implementation blindly attempt local writes. Things to follow-up on:
|