[SERVER-45795] moveChunk Issue after mongorestore (continuation of SERVER-44143) Created: 27/Jan/20  Updated: 27/Oct/23  Resolved: 07/Jul/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Max Isaev Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Follow "Restore a Sharded Cluster" https://docs.mongodb.com/v4.0/tutorial/restore-sharded-cluster/#d-restore-each-shard-replica-set (with mongodump/mongorestore)

Participants:

 Description   

Hello, this is continuation of SERVER-44143 (the same environment)

We are getting the following error on our PROD sharded cluster (which was migrated from docker to other servers (withouth using the docker technology) by following the "Restore a Sharded Cluster" https://docs.mongodb.com/v4.0/tutorial/restore-sharded-cluster/#d-restore-each-shard-replica-set (with mongodump/mongorestore)

===========================================
2020-01-25T14:13:31.655+0100 I SHARDING [migrateThread] migrate failed: InvalidUUID: Cannot create collection productrepository.products because we already have an identically named collection with UUID 55ab81fa-7d21-4742-8d71-f4ef8f741ec2, which differs from the donor's UUID 3db9aaae-c037-4162-b0a8-9eec312df936. Manually drop the collection on this shard if it contains data from a previous incarnation of productrepository.products '
===========================================

Here is the collections UUIDs we get if we connect to each shard:

PROD:

 

shard1:PRIMARY> db.getCollectionInfos()
[
 {
 "name" : "products",
 "type" : "collection",
 "options" : {
},
 "info" : {
 "readOnly" : false,
 "uuid" : UUID("b8dd9615-f861-4535-a434-9638f5e5c452")
 },
 "idIndex" : {
 "v" : 2,
 "key" : {
 "_id" : 1
 },
 "name" : "_id_",
 "ns" : "productrepository.products"
 }
 },
shard2:PRIMARY> db.getCollectionInfos()
{
 "name" : "products",
 "type" : "collection",
 "options" : {
},
 "info" : {
 "readOnly" : false,
 "uuid" : UUID("55ab81fa-7d21-4742-8d71-f4ef8f741ec2")
 },
 "idIndex" : {
 "v" : 2,
 "key" : {
 "_id" : 1
 },
 "name" : "_id_",
 "ns" : "productrepository.products"
 }
 },
shard3:PRIMARY> db.getCollectionInfos()
[
 {
 "name" : "products",
 "type" : "collection",
 "options" : {
},
 "info" : {
 "readOnly" : false,
 "uuid" : UUID("3db9aaae-c037-4162-b0a8-9eec312df936")
 },
 "idIndex" : {
 "v" : 2,
 "key" : {
 "_id" : 1
 },
 "name" : "_id_",
 "ns" : "productrepository.products"
 }
 },

 

Here we can see that UUID of the collection products in config replica set is the same as it is on shards in config.cache.collections

configserver:PRIMARY> db.collections.find()
{ "_id" : "config.system.sessions", "lastmodEpoch" : ObjectId("5bb4b070aec28d86b2174284"), "lastmod" : ISODate("1970-02-19T17:02:47.296Z"), "dropped" : false, "key" : { "_id" : 1 }, "unique" : false, "uuid" : UUID("68a953b8-1136-4891-ac27-acac48925d00") }
{ "_id" : "productrepository.products", "lastmodEpoch" : ObjectId("5bb4b060aec28d86b2174007"), "lastmod" : ISODate("1970-02-19T17:02:47.298Z"), "dropped" : false, "key" : { "productId" : "hashed" }, "unique" : false, "uuid" : UUID("0024b33d-2295-45b7-8bd1-d8ffb58a438d") }
 
shard1:PRIMARY> db.cache.collections.find()
{ "_id" : "config.system.sessions", "epoch" : ObjectId("5bb4b070aec28d86b2174284"), "key" : { "_id" : 1 }, "refreshing" : false, "unique" : false, "uuid" : UUID("68a953b8-1136-4891-ac27-acac48925d00"), "lastRefreshedCollectionVersion" : Timestamp(1, 0) }
{ "_id" : "productrepository.products", "epoch" : ObjectId("5bb4b060aec28d86b2174007"), "key" : { "productId" : "hashed" }, "refreshing" : false, "unique" : false, "uuid" : UUID("0024b33d-2295-45b7-8bd1-d8ffb58a438d"), "lastRefreshedCollectionVersion" : Timestamp(42, 91), "enterCriticalSectionCounter" : 13 }
shard2:PRIMARY> db.cache.collections.find()
{ "_id" : "config.system.sessions", "epoch" : ObjectId("5bb4b070aec28d86b2174284"), "key" : { "_id" : 1 }, "refreshing" : false, "unique" : false, "uuid" : UUID("68a953b8-1136-4891-ac27-acac48925d00"), "lastRefreshedCollectionVersion" : Timestamp(1, 0) }
{ "_id" : "productrepository.products", "epoch" : ObjectId("5bb4b060aec28d86b2174007"), "key" : { "productId" : "hashed" }, "refreshing" : false, "unique" : false, "uuid" : UUID("0024b33d-2295-45b7-8bd1-d8ffb58a438d"), "lastRefreshedCollectionVersion" : Timestamp(42, 91), "enterCriticalSectionCounter" : 17 }
shard3:PRIMARY> db.cache.collections.find()
{ "_id" : "config.system.sessions", "epoch" : ObjectId("5bb4b070aec28d86b2174284"), "key" : { "_id" : 1 }, "refreshing" : false, "unique" : false, "uuid" : UUID("68a953b8-1136-4891-ac27-acac48925d00"), "lastRefreshedCollectionVersion" : Timestamp(1, 0) }
{ "_id" : "productrepository.products", "epoch" : ObjectId("5bb4b060aec28d86b2174007"), "key" : { "productId" : "hashed" }, "refreshing" : false, "unique" : false, "uuid" : UUID("0024b33d-2295-45b7-8bd1-d8ffb58a438d"), "lastRefreshedCollectionVersion" : Timestamp(42, 91), "enterCriticalSectionCounter" : 11 }

The thing is that we have 2 TEST environmets (sharded clusters) that we clone our PROD to, every week. (We are restoring PROD's backup to the 2 TEST clusters, following the same procedure stated above)'

And I see, that on those two (cloned every week) environmets the UUID of products collection (productrepository.products) is every time unique and different between shards, as if mongorestore when we restore sequentially shards assigns new UUID to the sharded collection on each shard.

 

TEST cluster 1 :

 

shard1:PRIMARY> db.getCollectionInfos()
[
 {
 "name" : "products",
 "type" : "collection",
 "options" : {
},
 "info" : {
 "readOnly" : false,
 "uuid" : UUID("aeffed7c-f1b0-453c-9614-1b42a70991ef")
 },
 "idIndex" : {
 "v" : 2,
 "key" : {
 "_id" : 1
 },
 "name" : "_id_",
 "ns" : "productrepository.products"
 }
shard2:PRIMARY> db.getCollectionInfos()
{
 "name" : "products",
 "type" : "collection",
 "options" : {
},
 "info" : {
 "readOnly" : false,
 "uuid" : UUID("e0cc3f43-ffec-4dd7-a67c-6e38b9635c7a")
 },
 "idIndex" : {
 "v" : 2,
 "key" : {
 "_id" : 1
 },
 "name" : "_id_",
 "ns" : "productrepository.products"
 }
 },
shard3:PRIMARY> db.getCollectionInfos()
[
 {
 "name" : "products",
 "type" : "collection",
 "options" : {
},
 "info" : {
 "readOnly" : false,
 "uuid" : UUID("95694fff-8918-4c23-900e-99a110476b0c")
 },
 "idIndex" : {
 "v" : 2,
 "key" : {
 "_id" : 1
 },
 "name" : "_id_",
 "ns" : "productrepository.products"
 }
 },

 

TEST cluster №2

 

shard1:PRIMARY> db.getCollectionInfos()
[
 {
 "name" : "products",
 "type" : "collection",
 "options" : {
},
 "info" : {
 "readOnly" : false,
 "uuid" : UUID("37823188-b8f6-4a8d-9a22-dd609d54302e")
 },
 "idIndex" : {
 "v" : 2,
 "key" : {
 "_id" : 1
 },
 "name" : "_id_",
 "ns" : "productrepository.products"
 }
 }
 
shard2:PRIMARY> db.getCollectionInfos()
{
 "name" : "products",
 "type" : "collection",
 "options" : {
},
 "info" : {
 "readOnly" : false,
 "uuid" : UUID("b4de5c76-90f2-4075-9aa1-f6a99cef5608")
 },
 "idIndex" : {
 "v" : 2,
 "key" : {
 "_id" : 1
 },
 "name" : "_id_",
 "ns" : "productrepository.products"
 }
 }
]
 
shard3:PRIMARY> db.getCollectionInfos()
[
 {
 "name" : "products",
 "type" : "collection",
 "options" : {
},
 "info" : {
 "readOnly" : false,
 "uuid" : UUID("f43c6927-b6b4-4bd9-8e81-6e69478c5823")
 },
 "idIndex" : {
 "v" : 2,
 "key" : {
 "_id" : 1
 },
 "name" : "_id_",
 "ns" : "productrepository.products"
 }
 }
]

 

I have tried manually moving chunks in TEST env, using moveChunk command, and as expected, getting the issue with different UUIDs

Is mongorestore supposed to assign new UUIDs to restored sharded collections?

As I understand, to rectify the issue with chunk migration, the only way is to drop the collection through mongos (following the procedure described in SERVER-17397), then restore the collection through mongos and shard it.
But if, in case of a disaster, we are forced to restore the PROD cluster from backup, we would have to recreate all sharded collections this way (so far we have only one, but that will change in the future), and also rework our clone procedure to recreate the sharded collections this way?

P.S. After another clone that took place some hours later after writing the information above, I checked once againg the UUID of the sharded collection, and again it is different between shards and also don't correlate to the UUIDs from PROD

Thank you.



 Comments   
Comment by Dmitry Agranat [ 07/Jul/20 ]

Hi whispers2035@gmail.com,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Dima

Comment by Carl Champain (Inactive) [ 09/Mar/20 ]

Hi whispers2035@gmail.com,

Sorry for the late response!
If you want to see a documentation change, our DOCS project is open source, so feel free to open a new ticket describing the changes you'd like to see and why. 

Thank you,
Carl
 

Comment by Max Isaev [ 08/Feb/20 ]

Thank you for your responce!

Well, you see, that in the procedure we followed  https://docs.mongodb.com/v4.0/tutorial/restore-sharded-cluster (mongodump and mongorestore) we are not exactly encountering the issue decribed in SERVER-17397 as we don't drop any collections from mongos or elsewhere. In the case of the migration the backup of or initial prod was restored to the new servers that had never held any data, or in case of a disaster or just let us say, stop all the services and then just from the shell we purge data dirs on all servers (rm -rf /mongo/data/*) but our backup holds the config information that the collections are not dropped and in tact, however the behaviour of mongorestore leads to the issue of different and unique UUIDs across shards.

In my opinion, I see only two ways of preventing any further confusions of anyone else who backs up and restores their sharded clusters with the mongodump and mongorestore utilities.

  1. Update the documentation that after restore from backup via mongorestore, all sharded collections must be dropped according to the procedure stated in  SERVER-17397 and restored through mongos and sharded again.
  2. Address the behavior of mongorestore's assigning new UUIDs when dealing with sharded collections restore via mongorestore

Since 4.2 mongodump and mongorestore are not the tools to be used for backup anymore with sharded clusters, I think the first option is the most optimal.

Please let me know your thoughts.

Best regards,

Max

 

Comment by Carl Champain (Inactive) [ 07/Feb/20 ]

Hi whispers2035@gmail.com,

Is mongorestore supposed to assign new UUIDs to restored sharded collections?

mongorestore will intentionally result in a new UUID for a collection, it indicates that a namespace has been reused.

We really appreciate you writing this detailed ticket. I was able to recreate the migration error, and as you mentioned, this issue can be solved with the workaround in SERVER-17397, then restore the collection through mongos and shard it.
Your last question appears to address a situation related to your topology, and unless it reveals a bug in MongoDB, it is outside of our scope to help you manage it. However, do you think you are encountering a bug that is not addressed in SERVER-17397 and that should require our attention?

Kind regards,
Carl

Comment by Max Isaev [ 30/Jan/20 ]

It was unintentionally, I meant to link the whole procedure. Yes, first of all we restore the config replica set, then shards.

Comment by Danny Hatcher (Inactive) [ 27/Jan/20 ]

You specifically linked to the section describing restoring the shards. Are you performing the procedure of restoring the config servers beforehand?

Generated at Thu Feb 08 05:09:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.