[SERVER-35379] com.mongodb.MongoQueryException: Query failed with error code 207 and error message 'Collection db.collection UUID differs from UUID on change stream operations' on server <host> Created: 03/Jun/18  Updated: 07/Apr/23  Resolved: 09/Jul/18

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Rajender Shanigaram Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

staging, linux.


Issue Links:
Documented
Duplicate
duplicates SERVER-35999 A $changeStream can trigger assertion... Closed
Assigned Teams:
Query
Participants:
Case:

 Description   

Hi team, 

We integrated the change streams and using mongo-java-driver, however we are facing below issue, interestingly it is working fine for one collection and failing for second collection, we opened change stream on two collections.

Server version: 3.6.5
mongo-java-driver: 3.7.1

com.mongodb.MongoQueryException: Query failed with error code 207 and error message 'Collection <db>.<collection> UUID differs from UUID on change stream operations' on server <mogos>:27017
at com.mongodb.operation.QueryHelper.translateCommandException(QueryHelper.java:29)
at com.mongodb.operation.QueryBatchCursor.getMore(QueryBatchCursor.java:229)
at com.mongodb.operation.QueryBatchCursor.hasNext(QueryBatchCursor.java:115)
at com.mongodb.operation.ChangeStreamBatchCursor$1.apply(ChangeStreamBatchCursor.java:58)
at com.mongodb.operation.ChangeStreamBatchCursor$1.apply(ChangeStreamBatchCursor.java:55)
at com.mongodb.operation.ChangeStreamBatchCursor.resumeableOperation(ChangeStreamBatchCursor.java:132)
at com.mongodb.operation.ChangeStreamBatchCursor.hasNext(ChangeStreamBatchCursor.java:55)
at com.mongodb.client.internal.MongoBatchCursorAdapter.hasNext(MongoBatchCursorAdapter.java:54)
at .......
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)



 Comments   
Comment by Esha Maharishi (Inactive) [ 13/Mar/19 ]

sjohnson540, take a look at the user summary box on SERVER-36154 for a workaround.

Comment by Sam Johnson [ 13/Jul/18 ]

I just wanted to comment to say that I have also run into this issue. Steps I have taken to mitigate which have no effect:

  • Downgrade from 3.6.5 to 3.6.4
  • Stop reads/writes, restart mongos, mongod
  • Stopped all other changestream operations

This is in a cluster that has been upgraded from 3.4, I started a changestream to watch a collection that was actively being used and the changestream failed with the UUID error mentioned above.

This is in our staging environment and I have access to debug anything needed. I would be happy to help out in anyway I can to speed in the fixing of this issue as it is a blocker to us upgrading to 3.6 in production.

Comment by Rajender Shanigaram [ 10/Jul/18 ]

Thank you Nik, I will follow SERVER-35999 ticket. 

Comment by Nicholas Zolnierz [ 09/Jul/18 ]

Closing this ticket and marking as a duplicate of SERVER-35999, which more accurately describes the underlying issue.

Thanks,
Nick

Comment by Nicholas Zolnierz [ 05/Jul/18 ]

rshanigaram,

We are unassigning this ticket and returning it to the backlog. However, this does not indicate that we are no longer working on this case. The issue is being investigated internally and has an associated ticket tracking the internal escalation. Unassigning this ticket simply avoids duplication in our internal work tracking. Feel free to respond with any questions or concerns.

Thanks,
Nick

Comment by Rajender Shanigaram [ 08/Jun/18 ]

Nick,
1) I'm sorry, as per our DBA team, there was a specific reason to use 3.6.5, not easy at this moment to downgrade to 3.6
2) No, it is a staging cluster as of now not deployed yet to production, not many writes/updates/deletes.

Thanks,
Rajender.S

Comment by Nicholas Zolnierz [ 07/Jun/18 ]

rshanigaram it would also help to reproduce if you could provide more details on the other operations on the collection before and after opening the change stream. Is it a live cluster with many writes/updates/deletes? If you can control it, I wouldn't expect the initial change stream to fail if it has nothing to return yet. Thanks in advance!

Comment by Nicholas Zolnierz [ 07/Jun/18 ]

Thanks for bearing with me, rshanigaram. There was one suspicious change stream commit that went into 3.6.5 that affects the commands that mongos sends to the shards. If you could try with an older version of 3.6 and let me know if it still repros, that would be great.

Also do you mind posting the logs (with logLevel >= 2) for the failure? Unfortunately I haven't had any luck reproducing on my end.

Regards,
Nick

Comment by Rajender Shanigaram [ 06/Jun/18 ]

Thank you, Nicholas Zolnierz.
1)Different threads monitor on separate collection, not multiple threads on single collection, however I tried only one collection and one thread, specially it is failing for this particular collection only.
2) Sure, I tried with single mongos and no luck.

Thank you so much for looking into it.

Thanks,
Rajender.S

Comment by Nicholas Zolnierz [ 05/Jun/18 ]

Interesting, the issue may be related to concurrent change streams on multiple collections. If you haven't already, could you try removing one of the monitoring threads and just watch "myCollection"?

Also would be worth running with a single mongos and multiple monitoring threads, since the driver has a somewhat random process for determining which mongos to route requests to. In the meantime I'll work on setting up a repro case that bypasses the driver.

Thanks!
Nick

Comment by Rajender Shanigaram [ 04/Jun/18 ]

1) Yes, "myCollection" is already created and sharded, this was an old collection.
2) Passing collection also go through example same code, I'm creating two java threads, each one monitor single collection, create monitor thread is common code for both collection
3) I need full document on watch event, hence added UPDATE_LOOKUP, however I tried without fullDocument like below and getting same exception
cursor = collection.watch().iterator();

Thanks,
Rajender.S

Comment by Nicholas Zolnierz [ 04/Jun/18 ]

Thanks rshanigaram, at a glance what you've described should work. And just to verify, "myCollection" is already created and sharded? Could you also provide more details on the passing case with the other collection? I don't think the UPDATE_LOOKUP option is relevant to the assertion, but it would be worth trying without it as well.

Comment by Rajender Shanigaram [ 04/Jun/18 ]

Thank you Nicholas Zolnierz, please find the details below, I'm pretty sure that not dropping and re-create the collection.

1) we got two datacenters: A, B each datacenter has 1 primary, 2 secondaries (one local and remote)
2) mongoclient connection opened by passing list of mongos host names -> mongos1-A, mongos2-A, mongos1-B, mongos2-B
3) MongoCollection<Document> collection = database.getCollection("myCollection");
4) MongoCursor<ChangeStreamDocument<Document>> cursor = collection.watch().fullDocument(FullDocument.UPDATE_LOOKUP).iterator();

while(cursor.hasNext())

{ --- ChangeStreamDocument<Document> changeEvent = cursor.next(); ---- }

Kindly let me know if you need any more details or if I miss any.

Thanks,
Rajender.S

Comment by Nicholas Zolnierz [ 04/Jun/18 ]

Hi rshanigaram,

The error that you're seeing implies that the collection being watched has been dropped and recreated as sharded, resulting in a new UUID for the collection. Attempting to iterate a change stream cursor on the original collection will trigger this assertion, with a sequence of events such as:

1. Create collection and open a change stream on it.
2. Perform a write/update/delete operation.
3. Drop and recreate the sharded collection.
4. Attempt to get the next batch from the change stream cursor.

Can you include more details on the exact steps that you've taken to hit the assertion? This will help us narrow down where to investigate next.

Regards,
Nick

Generated at Thu Feb 08 04:39:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.