[SERVER-84902] POC $merge/$out running on secondaries Created: 22/Jan/20  Updated: 12/Jan/24  Resolved: 03/Feb/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Query 2020-02-10
Participants:

 Comments   
Comment by David Storch [ 03/Feb/20 ]

We've decided to move forward with this project, which is now tracked under PM-1770. Further planning for this project will follow the normal scope and design process.

Comment by Tess Avitabile (Inactive) [ 28/Jan/20 ]

Do replica set nodes which are not shardsvrs have a replica set monitor? If so, where do I obtain it from?

I'm not sure if they do. And I'm not totally sure whether I think it would be better. On the one hand, it's good to have a generic way to target the primary. On the other hand, it's duplicating work already done by the ReplicationCoordinator.

Comment by David Storch [ 28/Jan/20 ]

Thanks tess.avitabile. Yeah, targeting ourself should be fine, but will have to be tested of course. It's definitely possible that at the time-of-check we are not primary, but by the time we are actually running the inserts we are primary. But this shouldn't be a problem.

I also wanted to ask why we can't use the replica set monitor here.

Do replica set nodes which are not shardsvrs have a replica set monitor? If so, where do I obtain it from?

Comment by Tess Avitabile (Inactive) [ 28/Jan/20 ]

Those changes to the ReplicationCoordinator look good, except that you will need to lock _mutex in order to access _topCoord and _rsConfig. Another option is to make TopologyCoordinator::_currentPrimaryMember() public and call it from ReplicationCoordinator.

A risk to this approach is that we may be primary by the time we call getCurrentPrimaryHostAndPort(), so we may end up targeting ourselves. Does this work? If it does work, is there any benefit to checking if we're primary in MongoInterfaceStandalone::insert()?

I also wanted to ask why we can't use the replica set monitor here.

Comment by David Storch [ 28/Jan/20 ]

https://mongodbcr.appspot.com/561800005/ captures my work so far on this topic. This patch seems to work for running $merge with whenMatched:"fail", and whenNotMatched:"insert" against a secondary of a single replica set (unsharded) configuration. The work is not straightforward because the existing infrastructure for $merge/$out targeting remote nodes is pretty tightly coupled with sharding. I have a few remaining questions about how to implement this which probably require input from folks from distributed systems.

tess.avitabile, the linked patch needs to find the HostAndPort for the current primary node. See the changes to replication_coordinator.h and the coupled changes to ReplicationCoordinatorImpl. Does this seem like a reasonable change from your point of view?

esha.maharishi there are a few sharding-related questions remaining:

  1. The POC takes a slightly sketchy approach of using an empty ShardId in a AsyncRequestsSender::Request to mean "target the primary node of this replica set". See the changes to async_request_sender.cpp. Do you think something along these lines is acceptable?
    1. We could clean up the interface to be more explicit, or create an alternative interface where the caller passes a HostAndPort directly to the ARS.
    2. I wonder if there's a way to inject a different kind of host targeter? I don't think we can use the replica set monitor, but maybe we could have a thing which knows how to target based on read preference within its own replica set?
    3. Alternatively, I could circumvent the ARS and build my own ARS-like component which doesn't interact with sharding. I'd love to reuse the ARS if possible, though. It's more or less what I need.
  2. Right now, a mongod only has a TaskExecutorPool if it is a shardsvr, and it hangs off Grid. The POC changes mongod's startup sequence to decorate the ServiceContext with a TaskExecutorPool outside of Grid if the node is part of a replica set but not a shardsvr. See the changes in sharding_initialization.cpp and db.cpp. Does this seem like the right implementation direction? This non-sharding TaskExecutorPool doesn't have the various sharding-related hooks that the one on Grid does. Something like this seems necessary in order to prepare one node in a replica set to send query execution-related requests to other nodes in the replica set.
Comment by David Storch [ 24/Jan/20 ]

Bad news on question #2 above! The ClusterWriter appears to have sharding-specific pieces, and cannot be used out of the box:

[js_test:merge_on_secondaries] 2020-01-24T17:55:02.251-0500 d20021| 2020-01-24T17:55:02.251-0500 F  -        [conn1] Invalid access at address: 0x68
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.257-0500 d20021| 2020-01-24T17:55:02.256-0500 F  -        [conn1] Got signal: 11 (Segmentation fault).
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.257-0500 d20021|
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.257-0500 d20021| ----- BEGIN BACKTRACE -----
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.257-0500 d20021| {"backtrace":[{"b":"7F1DFE45B000","o":"19B719","s":"_ZN5mongo12rawBacktraceEPPvm"},{"b":"7F1DFE45B000","o":"19B9D4","s":"_ZN5mongo15printStackTraceERNS_14StackTraceSinkE"},{"b":"7F1DFE45B000","o":"19C6F3","s":"_ZN5mongo15printStackTraceERSo"},{"b":"7F1DFE45B000","o":"19AB9C"},{"b":"7F1DFE45B000","o":"19AD97"},{"b":"7F1DFAAF9000","o":"12890"},{"b":"7F1DFAAF9000","o":"A244","s":"__pthread_mutex_trylock"},{"b":"7F1DFE45B000","o":"188BC0","s":"_ZN5mongo5Mutex4lockEv"},{"b":"7F1E01370000","o":"428B5","s":"_ZN5mongo12CatalogCache11getDatabaseEPNS_16OperationContextENS_10StringDataE"},{"b":"7F1E0188B000","o":"1D744","s":"_ZN5mongo19createShardDatabaseEPNS_16OperationContextENS_10StringDataE"},{"b":"7F1DFA322000","o":"27068","s":"_ZN5mongo20ChunkManagerTargeter4initEPNS_16OperationContextE"},{"b":"7F1DFA35E000","o":"4A63","s":"_ZN5mongo13ClusterWriter5writeEPNS_16OperationContextERKNS_21BatchedCommandRequestEPNS_19BatchWriteExecStatsEPNS_22BatchedCommandResponseEN5boost8optionalINS_3OIDEEE"},{"b":"7F1DFA21A000","o":"15C4F","s":"_ZN5mongo24MongoInterfaceStandalone19_insertOnRemoteNodeERKN5boost13intrusive_ptrINS_17ExpressionContextEEERKNS_15NamespaceStringEOSt6vectorINS_7BSONObjESaISB_EERKNS_19WriteConcernOptionsENS1_8optionalINS_3OIDEEE"},{"b":"7F1DFA21A000","o":"15FAB","s":"_ZN5mongo24MongoInterfaceStandalone6insertERKN5boost13intrusive_ptrINS_17ExpressionContextEEERKNS_15NamespaceStringEOSt6vectorINS_7BSONObjESaISB_EERKNS_19WriteConcernOptionsENS1_8optionalINS_3OIDEEE"},{"b":"7F1DFFD97000","o":"10BCA3"},{"b":"7F1DFFD97000","o":"112403","s":"_ZN5mongo19DocumentSourceMerge5spillEOSt6vectorISt5tupleIJNS_7BSONObjENS_9write_ops18UpdateModificationEN5boost8optionalIS3_EEEESaIS9_EE"},{"b":"7F1DFFD97000","o":"126F25","s":"_ZN5mongo20DocumentSourceWriterISt5tupleIJNS_7BSONObjENS_9write_ops18UpdateModificationEN5boost8optionalIS2_EEEEE9doGetNextEv"},{"b":"7F1DFFD97000","o":"AC051","s":"_ZN5mongo14DocumentSource7getNextEv"},{"b":"7F1DFFD97000","o":"15780C","s":"_ZN5mongo8Pipeline7getNextEv"},{"b":"7F1E00AB6000","o":"B5597","s":"_ZN5mongo18PipelineProxyStage7getNextEv"},{"b":"7F1E00AB6000","o":"B560C","s":"_ZN5mongo18PipelineProxyStage6doWorkEPm"},{"b":"7F1E00AB6000","o":"B60E6","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"7F1E00AB6000","o":"10FD02","s":"_ZN5mongo16PlanExecutorImpl12_getNextImplEPNS_11SnapshottedINS_8DocumentEEEPNS_8RecordIdE"},{"b":"7F1E00AB6000","o":"11077B","s":"_ZN5mongo16PlanExecutorImpl7getNextEPNS_8DocumentEPNS_8RecordIdE"},{"b":"7F1DF9E4A000","o":"9C98E"},{"b":"7F1DF9E4A000","o":"A0D02","s":"_ZN5mongo12runAggregateEPNS_16OperationContextERKNS_15NamespaceStringERKNS_18AggregationRequestERKNS_18LiteParsedPipelineERKNS_7BSONObjERKSt6vectorINS_9PrivilegeESaISF_EEPNS_3rpc21ReplyBuilderInterfaceE"},{"b":"7F1DF9E4A000","o":"969EC"},{"b":"7F1DFA12F000","o":"17C3C"},{"b":"7F1DFA12F000","o":"1A615"},{"b":"7F1DFA12F000","o":"1B483","s":"_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE"},{"b":"7F1E01D06000","o":"8B31","s":"_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE"},{"b":"7F1E01CE3000","o":"1A218","s":"_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE"},{"b":"7F1E01CE3000","o":"15D8A","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},{"b":"7F1E01CE3000","o":"175BC"},{"b":"7F1E01CBA000","o":"1D3AB","s":"_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE"},{"b":"7F1E01CE3000","o":"13167","s":"_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE"},{"b":"7F1E01CE3000","o":"14655","s":"_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE"},{"b":"7F1E01CE3000","o":"150E6","s":"_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE"},{"b":"7F1E01CE3000","o":"15D4B","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},{"b":"7F1E01CE3000","o":"175BC"},{"b":"7F1E01CBA000","o":"1D776"},{"b":"7F1DFEA50000","o":"3EB6"},{"b":"7F1DFEA50000","o":"3F24"},{"b":"7F1DFAAF9000","o":"76DB"},{"b":"7F1DFA706000","o":"12188F","s":"clone"}],"processInfo":{"mongodbVersion":"0.0.0","gitVersion":"unknown","compiledModules":["enterprise","ninja"],"uname":{"sysname":"Linux","release":"5.0.0-37-generic","version":"#40~18.04.1-Ubuntu SMP Thu Nov 14 12:06:39 UTC 2019","machine":"x86_64"},"somap":[{"b":"7F1E01D06000","path":"build/ninja/mongo/db/libservice_context_d.so","elfType":3,"buildId":"272757A90DCEF07BD3EB8B5959E50415D54BB730"},{"b":"7F1E01CE3000","path":"build/ninja/mongo/transport/libservice_entry_point.so","elfType":3,"buildId":"FE1EC70DE28A3A30A2F82DA4F2663D7EF7E875FB"},{"b":"7F1E01CBA000","path":"build/ninja/mongo/transport/libservice_executor.so","elfType":3,"buildId":"CA18F9EADCC12B4642D191A936C39AD5C2F0520B"},{"b":"7F1E0188B000","path":"build/ninja/mongo/s/libsharding_router_api.so","elfType":3,"buildId":"7A0AAF9C3D8B82F5EA01F12912E388E5BDDEDF0D"},{"b":"7F1E01370000","path":"build/ninja/mongo/s/libgrid.so","elfType":3,"buildId":"02F46A29A95A9F7976EEFCC429BFD5019BE2F7F8"},{"b":"7F1E00AB6000","path":"build/ninja/mongo/db/libquery_exec.so","elfType":3,"buildId":"6DC5806EF4E5C609C804FC7A7DE10D329C4867E3"},{"b":"7F1DFFD97000","path":"build/ninja/mongo/db/pipeline/libpipeline.so","elfType":3,"buildId":"0C45DECEFCE3052C9733F5B73C9B552E0F80ECCC"},{"b":"7F1DFEA50000","path":"build/ninja/mongo/transport/libtransport_layer_common.so","elfType":3,"buildId":"A79F7F4ADE65882392AAA385B7DB30D746DB7335"},{"b":"7F1DFE45B000","path":"build/ninja/mongo/libbase.so","elfType":3,"buildId":"85E65362B8E4A8C75A001FB155D67177EDF095AA"},{"b":"7F1DFAAF9000","path":"/lib/x86_64-linux-gnu/libpthread.so.0","elfType":3,"buildId":"28C6AADE70B2D40D1F0F3D0A1A0CAD1AB816448F"},{"b":"7F1DFA706000","path":"/lib/x86_64-linux-gnu/libc.so.6","elfType":3,"buildId":"B417C0BA7CC5CF06D1D1BED6652CEDB9253C60D0"},{"b":"7F1DFA35E000","path":"build/ninja/mongo/s/libsharding_api.so","elfType":3,"buildId":"5C46066026D21A207C15DB5B5D3F5D9781F14B24"},{"b":"7F1DFA322000","path":"build/ninja/mongo/s/write_ops/libcluster_write_op.so","elfType":3,"buildId":"669FFF73A061D923D8A58E76E462FE990A31B38F"},{"b":"7F1DFA21A000","path":"build/ninja/mongo/db/pipeline/libprocess_interface_standalone.so","elfType":3,"buildId":"36FB2772966D933ADEE5116B8269AEECC04CC5EB"},{"b":"7F1DFA12F000","path":"build/ninja/mongo/db/libservice_entry_point_common.so","elfType":3,"buildId":"8F3E00F88BA5C1794073CC0CC92D10BE11A940DB"},{"b":"7F1DF9E4A000","path":"build/ninja/mongo/db/commands/libstandalone.so","elfType":3,"buildId":"99876EA5616B56D507CC0391CD9D7CB4B892CA05"}]}}
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.257-0500 d20021|  libbase.so(_ZN5mongo12rawBacktraceEPPvm+0x9) [0x7F1DFE5F6719]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.257-0500 d20021|  libbase.so(_ZN5mongo15printStackTraceERNS_14StackTraceSinkE+0xB4) [0x7F1DFE5F69D4]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libbase.so(_ZN5mongo15printStackTraceERSo+0x33) [0x7F1DFE5F76F3]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libbase.so(+0x19AB9C) [0x7F1DFE5F5B9C]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libbase.so(+0x19AD97) [0x7F1DFE5F5D97]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libpthread.so.0(+0x12890) [0x7F1DFAB0B890]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libpthread.so.0(__pthread_mutex_trylock+0x14) [0x7F1DFAB03244]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libbase.so(_ZN5mongo5Mutex4lockEv+0x20) [0x7F1DFE5E3BC0]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libgrid.so(_ZN5mongo12CatalogCache11getDatabaseEPNS_16OperationContextENS_10StringDataE+0xA5) [0x7F1E013B28B5]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libsharding_router_api.so(_ZN5mongo19createShardDatabaseEPNS_16OperationContextENS_10StringDataE+0x54) [0x7F1E018A8744]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libcluster_write_op.so(_ZN5mongo20ChunkManagerTargeter4initEPNS_16OperationContextE+0x58) [0x7F1DFA349068]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libsharding_api.so(_ZN5mongo13ClusterWriter5writeEPNS_16OperationContextERKNS_21BatchedCommandRequestEPNS_19BatchWriteExecStatsEPNS_22BatchedCommandResponseEN5boost8optionalINS_3OIDEEE+0xE3) [0x7F1DFA362A63]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libprocess_interface_standalone.so(_ZN5mongo24MongoInterfaceStandalone19_insertOnRemoteNodeERKN5boost13intrusive_ptrINS_17ExpressionContextEEERKNS_15NamespaceStringEOSt6vectorINS_7BSONObjESaISB_EERKNS_19WriteConcernOptionsENS1_8optionalINS_3OIDEEE+0x35F) [0x7F1DFA22FC4F]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libprocess_interface_standalone.so(_ZN5mongo24MongoInterfaceStandalone6insertERKN5boost13intrusive_ptrINS_17ExpressionContextEEERKNS_15NamespaceStringEOSt6vectorINS_7BSONObjESaISB_EERKNS_19WriteConcernOptionsENS1_8optionalINS_3OIDEEE+0x10B) [0x7F1DFA22FFAB]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libpipeline.so(+0x10BCA3) [0x7F1DFFEA2CA3]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libpipeline.so(_ZN5mongo19DocumentSourceMerge5spillEOSt6vectorISt5tupleIJNS_7BSONObjENS_9write_ops18UpdateModificationEN5boost8optionalIS3_EEEESaIS9_EE+0x283) [0x7F1DFFEA9403]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libpipeline.so(_ZN5mongo20DocumentSourceWriterISt5tupleIJNS_7BSONObjENS_9write_ops18UpdateModificationEN5boost8optionalIS2_EEEEE9doGetNextEv+0x375) [0x7F1DFFEBDF25]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libpipeline.so(_ZN5mongo14DocumentSource7getNextEv+0x41) [0x7F1DFFE43051]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libpipeline.so(_ZN5mongo8Pipeline7getNextEv+0x3C) [0x7F1DFFEEE80C]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libquery_exec.so(_ZN5mongo18PipelineProxyStage7getNextEv+0x27) [0x7F1E00B6B597]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.258-0500 d20021|  libquery_exec.so(_ZN5mongo18PipelineProxyStage6doWorkEPm+0x4C) [0x7F1E00B6B60C]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libquery_exec.so(_ZN5mongo9PlanStage4workEPm+0x56) [0x7F1E00B6C0E6]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libquery_exec.so(_ZN5mongo16PlanExecutorImpl12_getNextImplEPNS_11SnapshottedINS_8DocumentEEEPNS_8RecordIdE+0x1B2) [0x7F1E00BC5D02]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libquery_exec.so(_ZN5mongo16PlanExecutorImpl7getNextEPNS_8DocumentEPNS_8RecordIdE+0x4B) [0x7F1E00BC677B]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libstandalone.so(+0x9C98E) [0x7F1DF9EE698E]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libstandalone.so(_ZN5mongo12runAggregateEPNS_16OperationContextERKNS_15NamespaceStringERKNS_18AggregationRequestERKNS_18LiteParsedPipelineERKNS_7BSONObjERKSt6vectorINS_9PrivilegeESaISF_EEPNS_3rpc21ReplyBuilderInterfaceE+0x1E32) [0x7F1DF9EEAD02]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libstandalone.so(+0x969EC) [0x7F1DF9EE09EC]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libservice_entry_point_common.so(+0x17C3C) [0x7F1DFA146C3C]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libservice_entry_point_common.so(+0x1A615) [0x7F1DFA149615]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libservice_entry_point_common.so(_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE+0x4F3) [0x7F1DFA14A483]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libservice_context_d.so(_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE+0x41) [0x7F1E01D0EB31]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libservice_entry_point.so(_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE+0x108) [0x7F1E01CFD218]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libservice_entry_point.so(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0x11A) [0x7F1E01CF8D8A]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libservice_entry_point.so(+0x175BC) [0x7F1E01CFA5BC]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.259-0500 d20021|  libservice_executor.so(_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE+0x13B) [0x7F1E01CD73AB]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.260-0500 d20021|  libservice_entry_point.so(_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE+0x117) [0x7F1E01CF6167]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.260-0500 d20021|  libservice_entry_point.so(_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE+0x665) [0x7F1E01CF7655]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.260-0500 d20021|  libservice_entry_point.so(_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE+0x316) [0x7F1E01CF80E6]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.260-0500 d20021|  libservice_entry_point.so(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0xDB) [0x7F1E01CF8D4B]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.260-0500 d20021|  libservice_entry_point.so(+0x175BC) [0x7F1E01CFA5BC]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.260-0500 d20021|  libservice_executor.so(+0x1D776) [0x7F1E01CD7776]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.260-0500 d20021|  libtransport_layer_common.so(+0x3EB6) [0x7F1DFEA53EB6]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.260-0500 d20021|  libtransport_layer_common.so(+0x3F24) [0x7F1DFEA53F24]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.260-0500 d20021|  libpthread.so.0(+0x76DB) [0x7F1DFAB006DB]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.260-0500 d20021|  libc.so.6(clone+0x3F) [0x7F1DFA82788F]
[js_test:merge_on_secondaries] 2020-01-24T17:55:02.260-0500 d20021| -----  END BACKTRACE  -----

Comment by David Storch [ 24/Jan/20 ]

Notes from looking into this: $merge on secondaries does appear to work already for sharded clusters. $merge on secondaries for a replica set fails with an error like this:

[js_test:merge_on_secondaries] 2020-01-24T17:06:20.856-0500 assert: command failed: {
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.856-0500 	"topologyVersion" : {
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.856-0500 		"processId" : ObjectId("5e2b6a59484b531afc9ce793"),
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.856-0500 		"counter" : NumberLong(3)
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.856-0500 	},
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.856-0500 	"operationTime" : Timestamp(1579903580, 10),
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.856-0500 	"ok" : 0,
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.857-0500 	"errmsg" : "Not primary while writing to test.output",
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.857-0500 	"code" : 189,
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.857-0500 	"codeName" : "PrimarySteppedDown",
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.857-0500 	"$clusterTime" : {
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.857-0500 		"clusterTime" : Timestamp(1579903580, 10),
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.857-0500 		"signature" : {
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.857-0500 			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.857-0500 			"keyId" : NumberLong(0)
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.857-0500 		}
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.857-0500 	}
[js_test:merge_on_secondaries] 2020-01-24T17:06:20.858-0500 } : aggregate failed

This happens because replica set nodes that are not shard servers are initialized with MongoInterfaceStandalone rather than MongoInterfaceShardSvr. The methods that $merge calls into in the "standalone" implementation blindly attempt local writes.

Things to follow-up on:

  • To choose whether to write locally or target writes to another node, we need to consult the ReplicationCoordinator while holding the RSTL lock. Is it ok to acquire this lock, make the check, and then drop the lock?
    • If we conclude that we are a secondary, drop the lock, and then we become primary then no harm is done. We should be able to target writes to ourself. It won't be efficient, but it will work.
    • If we conclude that we are a primary but then we become secondary, then the operation will fail with a NotMaster error. I think this is ok? Like, can't that happen while inserts are taking place in general and applications need to be prepared to handle such errors?
  • Is the ClusterWriter available and fully-functional on replica set nodes?
Generated at Thu Feb 08 06:56:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.