One thought would be to have the donor shard handle the WouldChangeOwningShard exception locally without bubbling it up to mongos.
One idea I recently had would be to
I like this approach over trying to add local WouldChangeOwningShard exception handling to the donor shard's service entry point. I suspect it would be difficult to relax the batch of size 1 and multi=true restrictions in the WouldChangeOwningShard exception handling approach because the multiple updates may trigger the WouldChangeOwningShard exception multiple times for the same command.
Another part of the OpObserverImpl::onUpdate() idea would be to avoid generating a new logical session ID for each multi-document transaction applyOps we'd generate. I think it is possible to use a single logical session ID per donor shard. The challenge here is that OpObserverImpl::onUpdate() is called concurrently by updates to different documents within the same collection and cannot use the real-time thread scheduling order as an indication of the optime order that secondaries would end up seeing the oplog entries in. My idea here would be to use the timestamp component of the generated optime as the txnNumber. This guarantees the txnNumber appears as always increasing to secondaries. If we want to think slightly less hard about a single donor replica set shard members all using the same logical session ID (because the timestamp component of an optime is only guaranteed unique once it has become majority-committed), then we could make it so the donor shard primary generates a new logical session ID in each new term.
|