[SERVER-45844] UUID shard key values cause failed chunk migrations Created: 29/Jan/20  Updated: 20/May/20  Resolved: 29/Apr/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dilip Kolasani Assignee: Cheahuychou Mao
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File sh-status    
Issue Links:
Duplicate
duplicates SERVER-47745 Make chunk query in ShardingCatalogMa... Closed
Related
Operating System: ALL
Steps To Reproduce:

sh-status

Sprint: Sharding 2020-05-04
Participants:
Case:

 Description   

UUID shard key values result in chunk _ids (in the chunks collection) that aren't correctly inferred when moving chunks. That is, the chunk _id represents data as BinData. The result is that the chunk cannot be found.

original description

Hi we are using mongo sharded cluster running with 4.0.5.( this cluster was upgraded from 3.4.9->3.6.9>4.0.5 couple of months back)

Architecture:
      3 mongos
       config server running as replica set ( 1 primary + 2 secondaries)
      1 shard with 3 nodes running as replica set ( 1 primary + 2 secondaries)

Since shard1 is running out of disk space, we added shard2.After adding shard2, we see Balancer is not moving chunks and it's throwing following message

2020-01-29T17:35:21.256+0000 I SHARDING [Balancer] Balancer move keychain.keyring: [\{ keyringId: MinKey }, \{ keyringId: UUID("00000000-2da7-4f75-826f-fb8939b25f3f") }), from prod-mongodb-digitalplatform-02-shard1, to prod-mongodb-digitalplatform-02-shard2 failed :: caused by :: IncompatibleShardingMetadata: Chunk move was not successful :: caused by :: Tried to find the chunk for 'keychain.keyring-keyringId_UUID("06597831-055b-425f-8f83-63d6935bc55b"), but found no chunks
2020-01-29T17:35:21.256+0000 I SHARDING [Balancer] about to log metadata event into actionlog: \{ _id: "ip-10-120-122-158-2020-01-29T17:35:21.256+0000-5e31c259018fac481bd1ee62", server: "ip-10-120-122-158", clientAddr: "", time: new Date(1580319321256), what: "balancer.round", ns: "", details: { executionTimeMillis: 559, errorOccured: false, candidateChunks: 1, chunksMoved: 0 } }

we tried even moving some of chunks manually and they also failed with same reason.

sh.status() output is attached

We issued the following command to include chunk info from above sh.status() output to move one chunk

command:

db.adminCommand( { moveChunk : "keychain.keyring" ,
                 bounds : [{ "keyringId" : UUID("fff68145-c9f1-4915-a2da-5f66d02820ad") }, { "keyringId" : UUID("fffb99c8-3726-47cb-94b5-6637a36788c0") }] ,
                 to : "prod-mongodb-digitalplatform-02-shard2"
                  } )

Output:

mongos> db.adminCommand( { moveChunk : "keychain.keyring" ,
...                  bounds : [{ "keyringId" : UUID("fff68145-c9f1-4915-a2da-5f66d02820ad") }, { "keyringId" : UUID("fffb99c8-3726-47cb-94b5-6637a36788c0") }] ,
...                  to : "prod-mongodb-digitalplatform-02-shard2"
...                   } )
{
	"ok" : 0,
	"errmsg" : "Chunk move was not successful :: caused by :: Tried to find the chunk for 'keychain.keyring-keyringId_UUID(\"fff68145-c9f1-4915-a2da-5f66d02820ad\"), but found no chunks",
	"code" : 105,
	"codeName" : "IncompatibleShardingMetadata",
	"operationTime" : Timestamp(1580320429, 23),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1580320429, 39),
		"signature" : {
			"hash" : BinData(0,"3e0AlrK695L04KiXhy+axkBMaNk="),
			"keyId" : NumberLong("6762645768143634453")
            }
            }
            }

Apart from this , we also issued flushRouterConfig multiple times and we restarted all mongos and even replaced all config servers with new servers. But still same issue exists.

Note: featureCompatibilityVersion is set to 4.0 on all shards and config server.

Please let me know if there is any known bug around this or any configuration that we need to tweak on our side.



 Comments   
Comment by Cheahuychou Mao [ 29/Apr/20 ]

Closing this ticket as the fix was completed in SERVER-47745

Comment by Githook User [ 29/Apr/20 ]

The commit for SERVER-47745 Make chunk query in ShardingCatalogManager compatible with chunks created in 3.4.

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-45844 UUID shard key values cause failed chunk migrations
Branch: v4.2
https://github.com/mongodb/mongo/commit/55ea26e1ad7c01038b73551ba483194967567311

Comment by Dilip Kolasani [ 21/Feb/20 ]

Thanks Carl. Appreciate all your help on this.

Comment by Carl Champain (Inactive) [ 20/Feb/20 ]

haidilip83@gmail.com,

Sorry for the delayed answer!
The stated issue appears to be a bug related to the string representation of the UUID min key of the chunk that we used to put in the _id of the chunk's document in 4.0 and 4.2. Because of this we can't match the chunk that is attempted to be moved, even though it probably exists.
We are passing this ticket along to the appropriate team for further investigation.

Kind regards,
Carl

 

Comment by Dilip Kolasani [ 17/Feb/20 ]

Hi Carl,

Did you get a chance to look into this ? Iam very sorry for chasing down on this. If there is no known bug/work around this issue, we will continue add more storage to existing Shard itself.

Regards
Dilip K

Comment by Carl Champain (Inactive) [ 07/Feb/20 ]

haidilip83@gmail.com,

We are still investigating the issue and will let you know as soon as we have a conclusive answer. 

Comment by Dilip Kolasani [ 05/Feb/20 ]

Carl Champain, did you get a chance to looks at the logs and conclude what might be reason?

Comment by Carl Champain (Inactive) [ 03/Feb/20 ]

Thanks haidilip83@gmail.com!
We are investigating the issue and will let you know.

Comment by Dilip Kolasani [ 30/Jan/20 ]

Hi Carl Champain,
I have uploaded logs files from mongos/config/sharded servers.

Comment by Carl Champain (Inactive) [ 30/Jan/20 ]

haidilip83@gmail.com,

In addition of the requested logs, could you also provide a mongodump of your config server?
The command should look like this:

mongodump --db=config --host=<hostname:port_of _the_mongos>

We've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thank you,
Carl

Comment by Carl Champain (Inactive) [ 29/Jan/20 ]

Hi haidilip83@gmail.com,

Thank you for the report.
To help us understand what is happening, can you please provide the logs for:

  • Each of the mongos
  • The primary of shard1 and shard2
  • The primary of the config servers

We want to determine wether or not the metadata is stale somewhere in your cluster.

Kind regards,
Carl

Generated at Thu Feb 08 05:09:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.