[SERVER-36551] Collection UUID differs from UUID on change stream operations Created: 09/Aug/18  Updated: 08/Mar/19  Resolved: 31/Jan/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.1, 3.6.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Artem Assignee: Kelsey Schubert
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I try to migrate from oplog fetching to change stream and got error:

mongos> db.users.watch()
Error: getMore command failed: {
	"ok" : 0,
	"errmsg" : "Collection user.users UUID differs from UUID on change stream operations",
	"code" : 207,
	"codeName" : "InvalidUUID",
	"$clusterTime" : {
		"clusterTime" : Timestamp(1533826850, 154),
		"signature" : {
			"hash" : BinData(0,"fFr/jmk/ZqH6KFM6adIfYcCptw0="),
			"keyId" : NumberLong("6567806354877055160")
		}
	},
	"operationTime" : Timestamp(1533826850, 154)
} 

After googling I found issue SERVER-31691 and try to fix by executing forceRoutingTableRefresh command on PRIMARY shards:

MongoDB server version: 3.6.6
user:PRIMARY> db.getSiblingDB('admin').createUser({user: 'root2', pwd: 'root2', roles: ['root', '__system']})
user:PRIMARY> quit()
 
// relogin as root2
 
MongoDB server version: 3.6.6
user:PRIMARY> db.adminCommand({forceRoutingTableRefresh: "user.users", syncFromConfig: true})
user:PRIMARY> db.getSiblingDB('config').cache.collections.find({_id: "user.users"})
{ "_id" : "user.users", "epoch" : ObjectId("5a6a38d2999e40c3345af3f8"), "key" : { "_id" : "hashed" }, "unique" : false, "refreshing" : false, "lastRefreshedCollectionVersion" : Timestamp(193, 1), "enterCriticalSectionCounter" : 192 }
user:PRIMARY> db.getSiblingDB('user').getCollectionInfos({name: 'users'})
[
	{
		"name" : "users",
		"type" : "collection",
		"options" : {
			
		},
		"info" : {
			"readOnly" : false,
			"uuid" : UUID("a6eb2f4c-0c57-4f79-8d68-ef597b1bd5a6")
		},
		"idIndex" : {
			"v" : 2,
			"key" : {
				"_id" : 1
			},
			"name" : "_id_",
			"ns" : "user.users"
		}
	}
]

But it was unsuccessfully:

  • I have same error on change stream creating (both: on mongos and on PRIMARY);
  • I don't see collection UUID in config.cache.collections on shard nodes;
  • I see same UUID in getCollectionInfos result and in config.collections on configuration replica set.

The user.users collection was created half-year ago and servers was some times rebooted and upgraded.



 Comments   
Comment by Kelsey Schubert [ 31/Jan/19 ]

Hi bozaro,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Kelsey

Comment by Kelsey Schubert [ 30/Nov/18 ]

Hi bozaro,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please answer my questions above?

Thank you,
Kelsey

Comment by Kelsey Schubert [ 12/Oct/18 ]

Hi bozaro,

Thanks for the additional information and sorry for the delay getting back to you. Would you please clarify whether you are able to reproduce this issue with fresh data directly inserted into MongoDB 3.6?

If not, I have a few follow up questions:

  • I assume the production snapshot is file system snapshot of each individual host/mongod, is this correct?
  • Was the production cluster previously running MongoDB 3.4 or earlier?
    • If so, could you walk me through the upgrade process?

Thank you,
Kelsey

Comment by Artem [ 16/Aug/18 ]

I can reproduce and fix this issue.

Reproducing:

  1. I installed mongo servers (3.6.6) with data from production snapshot (sharded cluster with 3 nodes in each replicaset);
  2. After starting cluster error was not reproduced (on mongos and on each replicaset node);
  3. I stepdown rs0 PRIMARY (rs0-1: PRIMARY -> SLAVE, rs0-2: SLAVE -> PRIMARY);
  4. After stepdown I have UUID differs error on new PRIMARY node (rs0-2), but all other nodes not reproduced issue;
  5. I stepdown rs0 PRIMARY again (rs0-2: PRIMARY -> SLAVE, rs0-3: SLAVE -> PRIMARY);
  6. After stepdown I have UUID differs error on all nodes, which became PRIMARY on reelection (rs0-3 and rs0-2), but rs0-1 not reproduced issue;
  7. I stepdown rs0 PRIMARY again (rs0-3: PRIMARY -> SLAVE, rs0-1: SLAVE -> PRIMARY);
  8. After stepdown I have UUID differs error on all nodes.
  9. After restart, node not reproduced issue until became PRIMARY on reelection.

Solution:

  1. Find collection UUID value on configuration replicaset:

 db.getSibling("config").collections.find({_id: "user.users"}, {uuid: 1})

  1. Set UUID value on shard replicasets one by one:

 db.getSibling("config").cache.collections.update({_id: "user.users"}, {$set:{"uuid" : UUID("a6eb2f4c-0c57-4f79-8d68-ef597b1bd5a6")}})

  1. Restart shard replicasets nodes one-by-one.

If I remove uuid value from cache.collection back, I can reproduce this issue again.

Comment by Nick Brewer [ 15/Aug/18 ]

I did some testing (a 3.4 --> 3.6 upgrade) on this as well and I wasn't able to reproduce it either.

I'll close this ticket for now, but if you run into this again feel free to comment here and we can reopen it.

-Nick

Comment by Artem [ 13/Aug/18 ]

I installed mongo servers (3.6.6) with data from production snapshot: this error was not reproduced.

I have no thoughts how to fix (on production) or reproduce (on test environment) the problem

Comment by Artem [ 12/Aug/18 ]

I already did it.

I run forceRoutingTableRefresh with and without syncFromConfig flag on version 3.6.1 and 3.6.6.

Next week I would to create copy of production cluster and try to add UUID to config.cache.collections manually.

Comment by Nick Brewer [ 10/Aug/18 ]

bozaro Please try running the forceRoutingTableRefresh command without the syncFromConfig: true option specified.

Thanks,
Nick

Comment by Artem [ 10/Aug/18 ]

Yes. All servers was rebooted after forceRoutingTableRefresh command on the primary.

Also I can see "Collection user.users UUID differs from UUID on change stream operations" error even I try to create changeStream directory on primary (without using mongos).

Comment by Nick Brewer [ 09/Aug/18 ]

bozaro Per the instructions in the linked ticket, did you reboot the secondaries after running the forceRoutingTableRefresh command on the primary?

-Nick

Generated at Thu Feb 08 04:43:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.