[SERVER-44484] Changestream with updateLookup uasserts on updates from before collection was sharded Created: 07/Nov/19  Updated: 29/Oct/23  Resolved: 25/Jan/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 4.0.0, 4.2.0
Fix Version/s: 4.3.3, 4.0.28, 4.2.19

Type: Bug Priority: Major - P3
Reporter: Bernard Gorman Assignee: Bernard Gorman
Resolution: Fixed Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2, v4.0
Sprint: Query 2019-12-02, Query 2019-12-30, Query 2020-01-13, Query 2020-01-27, Query 2020-02-10
Participants:

 Description   

An update operation writes the modified document's documentKey to the o2 field of its oplog entry; this is used by $changeStream to look up the document in the cluster if the updateLookup option is specified. An update on a sharded collection will write the shard key plus _id to the o2 field; on an unsharded collection, just the _id. But this means that if an unsharded collection is subsequently sharded on a key other than _id, the updateLookup for all pre-sharding update events will attempt to target the lookup by _id alone, will be unable to target a single shard, and will therefore always fail with an exception.

It is possible for this failure to occur in one of two different ways:
1. An assertion which fails just the change stream, which would look something like this:

{
  ok: 0
  code: 1,
  codeName: "InternalError",
  msg: "Unable to target lookup query to a single shard: {query.toString()}
}

2. An invariant failure on a mongos process which would look something like this in the logs:

Invariant failure shardResult.size() == 1u src/mongo/s/commands/pipeline_s.cpp



 Comments   
Comment by Githook User [ 23/Dec/21 ]

Author:

{'name': 'Rishab Joshi', 'email': 'rishab.joshi@mongodb.com', 'username': 'rishvin'}

Message: SERVER-44484 Changestream with updateLookup uasserts on updates from before collection was sharded.
Branch: v4.0
https://github.com/mongodb/mongo/commit/223c12d517f37a007e2520aeed2bfea7809d1e45

Comment by Githook User [ 13/Dec/21 ]

Author:

{'name': 'Rishab Joshi', 'email': 'rishab.joshi@mongodb.com', 'username': 'rishvin'}

Message: SERVER-44484 Changestream with updateLookup uasserts on updates from before collection was sharded.
Branch: v4.2
https://github.com/mongodb/mongo/commit/82be116f34ef0f6f4ead402f2e9225a76af44e73

Comment by Githook User [ 03/Dec/21 ]

Author:

{'name': 'Rishab Joshi', 'email': 'rishab.joshi@mongodb.com', 'username': 'rishvin'}

Message: SERVER-44484 Changestream with updateLookup uasserts on updates from
before collection was sharded.
Branch: BACKPORT-10624-v4.2
https://github.com/mongodb/mongo/commit/04a653db3c1734b2e7d8cb612e56f303115be4e3

Comment by Githook User [ 25/Jan/20 ]

Author:

{'username': 'gormanb', 'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com'}

Message: SERVER-44484 Allow change stream update lookup to retrieve post-image by _id

create mode 100644 jstests/sharding/change_streams_unsharded_update_resume.js
Branch: master
https://github.com/mongodb/mongo/commit/6c45478fbdc994353541a0f05ff202cedf251d7a

Comment by Bernard Gorman [ 07/Dec/19 ]

Met with asya and charlie.swanson to discuss this today. We decided that the appropriate solution here is to simply perform the updateLookup by _id despite the fact that we cannot target it to a single shard, in the expectation that only a single valid document will be returned. Because the lack of a shard key in the documentKey indicates that the document was inserted while the collection was unsharded, the only way this can return more than one result is if the user has actively inserted another document with the same _id since the collection became sharded. We already have code which will uassert if more than one document is returned by the updateLookup, which would be the correct course of action in this case. If this does happen, the user can either remove the offending document and re-insert it with a different _id, or can resume the stream without updateLookup in order to bypass this entry in the oplog.

Moving this back to Needs Scheduling for re-triage.

Comment by Craig Homa [ 12/Nov/19 ]

Moving this to 'investigating' as the next step here is to determine how this can be fixed.

Generated at Thu Feb 08 05:06:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.