[SERVER-65340] Operations hang when re-using dropped unsharded collections Created: 07/Apr/22  Updated: 06/Dec/22  Resolved: 20/May/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Wernfried Domscheit Assignee: Backlog - Triage Team
Resolution: Duplicate Votes: 0
Labels: replication
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-65930 DDL coordinators and rename participa... Closed
Related
related to SERVER-60266 Retry WriteConcernError exceptions in... Closed
related to SERVER-61416 Indefinitely retry errors in rename c... Closed
is related to DOCS-15321 Document that writeConcern.wtimeout i... Backlog
Assigned Teams:
Server Triage
Operating System: ALL
Participants:

 Description   

I run a sharded cluster with 4 shards, each shard is deployed as PSA Replica Set (primary+secondary+arbiter). The Config server (CSRS) is deployed with 3 members (PSS). The application has a fairly high load, every second up to 100'000 documents are inserted, resulting in app. 5 Billion documents inserted every day.

The documents are inserted into (non-sharded) raw-data collections. Once per hour I run a bucketing job which does following steps:

  1. Check health status of the replica sets. If a secondary member is not available then reconfigure the replica set member to { {{votes: 0, priority: 0 }}}
  2. Rename the (non-sharded) raw-data collection.
    Due to continuous inserts, a new raw-data collection is automatically created instantly.
  3. Create some indexes on the new raw-data collection
  4. Run some aggregation pipelines with $count, $group, etc. on the "old" raw-data collection. Aggregated data is merged into a sharded final collection.
  5. Drop the "old" raw-data collection

When you run renameCollection on mongos then write concern is set to "majority". That's the reason for doing the health check at the beginning and reconfigure the replica set in case of an outage.

Now the problem: If a secondary member goes down, the everything works as intended. The member is reconfigured as passive and application continues to work. But when the member recovers again, then sometimes the database get corrupted. I can drop any collection or even entire database (i.e. db.dropDatabase()) but a new "raw-data" collection cannot be created. I can create any new collection with different name but the "raw-data" collection name is blocked.

Restarting all nodes in the sharded cluster does not help.

Dropping all collections from database and even db.dropDatabase() does not help

So far, the only solution was to drop the entire sharded cluster and deploy it from scratch.

Currently the application runs as "proof of concept", of course it would be no solution for production environment.

It seems to be a bit difficult to reproduce this problem, however it appeared already two times. The first time I faced the issue was while Upgrade a Sharded Cluster

Perhaps based on this information you get already an idea where the problem could be. I will try to hit this issue again with proper collection of all log files. In general replication seems to have a problem when you drop and re-create a collections with the same name. I also run into problems when replica set is running an initial synchronization. My current workaround is to suspend the bucketing job whenever I need to run any kind of recover/restoring.

Kind Regards
Wernfried
 



 Comments   
Comment by Pierlauro Sciarelli [ 01/Aug/22 ]

Just writing to let you know that v5.0.10 has finally been released with all the fixes mentioned above.

Comment by Pierlauro Sciarelli [ 31/May/22 ]

Hi wernfried.domscheit@sunrise.net , just adding that SERVER-65930 had not been backported in time for 5.0.9 and it will be released in 5.0.10. For the moment, the only workaround to "unstuck" shards is to step-down and up them again (where majority is guaranteed). Please note that even if you hit a DuplicateKey error, the operation will be resumed on step-up.

Comment by Eric Sedor [ 20/May/22 ]

wernfried.domscheit@sunrise.net it sounds like you are still reporting a difficult to reproduce case where you:

  • Stop one or more Non-Primary Shard members (so that a majority of nodes in one shard are not live)
  • Attempt renameCollection (with writeConcern majority)
  • Wait more than 60 Seconds
  • Restart the node(s)

And the result is that the rename command is still hanging and the old namespace never becomes reusable.

EDIT: The sharding team believes this has been fixed as part of SERVER-65930 and your report in this ticket was very helpful in working the issue. Thank you! The fix will be available in 5.0.9.

To clarify though. When this happens: on earlier versions:

  • The namespaces should not be inaccessible indefinitely
  • Another workaround is to step down the primaries of the shards, and then step them back up to make them primary again. The rename should resume.
  • It's definitely our desire that rename block indefinitely if it cannot complete, specifically to avoid corruption.
Comment by Wernfried Domscheit [ 05/May/22 ]

Hi @Tommaso Tocci,

Indeed, due to Sharding Administrative Commands Override Write Concern Settings

The commands use a minimum wtimeout of 60000. The commands only use the global default write concern wtimeout if it is greater than 60000.

I was always waiting longer than 60 Seconds when I provided wtimeout, so I anticipated it is relevant.

It is not trivial to run reliable and reproducible test which are really focused to a specific problem.

 

Comment by Tommaso Tocci [ 05/May/22 ]

Hi wernfried.domscheit@sunrise.net, first of all I want to thank you for all the extensive test you are performing.

Regarding your comment:

It looks really essential to run renameCollection without providing writeConcern.wtimeout Perhaps it would be an option for MongoDB to ignore writeConcern.wtimeout settings for renameCollection command.

I wanted to clarify that in sharded clusters all DDL operations already ignore the write concern timeout if specified.

So it really doesn't matter which write concern timeout you specify when you invoke renameCollection in a shareded cluster.

Comment by Wernfried Domscheit [ 05/May/22 ]

Hello

Let me also highlight the major problem (applies for version 5.0.8).

When you rename an unsharded collection in a shared cluster while a member is not available, then personally I would prefer an error. Anyway, if the design is to wait forever (or wait until member is available back again) then I am also fine with it.

 

This morning I repeated some tests. It looks really essential to run renameCollection without providing writeConcern.wtimeout Perhaps it would be an option for MongoDB to ignore writeConcern.wtimeout settings for renameCollection command.

 

However, the problem still exists when running this procedure:

  • Stop a Non-Primary Shard member
  • Attempt renameCollection (with {{}{writeConcern: { w: "majority" }}{})
  • Wait more than 60 Seconds
  • Result ->
    • Command is still waiting/hanging
    • Collection is renamed
    • Old namespace is corrupt, i.e. not accessible anymore
    • Only solution to make old namespace accessible again, is to drop entire deployment and re-build sharded cluster from scratch - at least I did not find any other solution.

So, it works fine if the unavailable member returns back (or has been re-configured with {votes:0, priority: 0}) within 60 Seconds, otherwise you have a problem.

Is there any possibility to increase the 60 Seconds to a higher value? I tried changing parameter transactionLifetimeLimitSeconds but it does not change the behavior.

My current workaround is to check replica set status right before I run renameCollection.

 

When I stop a Primary Shard member instead of a Non-Primary Shard member, then I get an "E11000 duplicate key error collection: config.system.sharding_ddl_coordinators...". This would be fine for me, because getting an error is much better than getting a corrupt database.

 

Kind Regards
Wernfried

 

 

Comment by Eric Sedor [ 04/May/22 ]

Hello again,

I want to focus on three aspects of this:

  • You are pointing to a bug ("duplicate key error...") that's described in SERVER-65930. That's definitely something we want to fix.
  • We want access to a collection to be blocked until a DDL operation like rename can complete, and (if the operation reaches a point after which it cannot be rolled back) for rename to be indefinitely retried until it can be majority committed on all shards. We don't think qualifies as corruption. Rather, this is about preventing corruption. This is documented behavior here
    • Resolving a hanging DDL operation that stems from an availability issue should involve working the availability issue.
    • Depending on machine reliability and availability requirements, sharding use-cases that make heavy use of DDL operations may want additional replica set members to ensure DDL operations can complete.
    • For most use-cases, we recommend against the use of arbiters, specifically because of the majority availability concerns that a single downed node can raise. I should have mentioned this before!
  • If a sharded cluster performing a rename operation reaches a point where it cannot roll back a rename operation, it's going to continue retrying indefinitely, so wtimeout should not be provided. I've opened DOCS-15321 to clarify this.
Comment by Wernfried Domscheit [ 04/May/22 ]

Hi Eric

 

Upgrade to 5.0.8 does not solve the issue. It works to a certain extend if a member of Primary Shard stops, however collection gets corrupt and becomes unusable if a Non-Primary Shard is not reachable while renameCollection command is executed.

 

 

Environment:

  • Windows
  • MongoDB version 5.0.3
  • Sharded Cluster
    • Config Server: PSS (PRIMARY + SECONDARY + SECONDARY)
    • Shard Servers: PSA (PRIMARY + SECONDARY + ARBITER)
  • Database / collection tdube04mipmed1.sessions is not sharded
  • Stop one mongod service at Shard server
  • Run Command:
  • db.adminCommand({
       renameCollection: "tdube04mipmed1.sessions",
       to: "tdube04mipmed1.sessions.124-0700",
       dropTarget: false,
       writeConcern: { w: "majority", wtimeout: 120 * 1000 }
    }) 

Result depends on which service was stopped:

  • Shutdown SECONDARY member of Primary Shard: OK -> Error after writeConcern.wtimeout

{
    "ok" : 0,
    "errmsg" : "Failed to release sharding DDL coordinator :: caused by :: waiting for replication timed out",
    "code" : 64,
    "codeName" : "WriteConcernFailed",
}

* Shutdown PRIMARY member of Primary Shard: Not OK -> hangs forever / hangs till member is running again

    • If writeConcern.wtimeout is specified then replSetReconfig does not work
  • Shutdown SECONDARY member of a Non-Primary Shard: Not OK -> hangs forever, collection tdube04mipmed1.sessions gets corrupt, not usable anymore
  • Shutdown PRIMARY member of a Non-Primary Shard: Not OK -> hangs forever, collection tdube04mipmed1.sessions gets corrupt, not usable anymore

 

When I run the same with MongoDB version 5.0.8

  • Shutdown SECONDARY member of Primary Shard: Not OK (but acceptable) -> hangs forever / hangs till member is running again
  • Shutdown PRIMARY member of Primary Shard: Not OK (but acceptable) -> hangs forever / hangs till member is running again
    • When member is running again (or successful replSetReconfig with votes:0, priority: 0) before writeConcern.wtimeout expired: 

{ "ok" : 1 } 

*

    • When member is running again (or successful replSetReconfig with votes:0, priority: 0) after writeConcern.wtimeout expired: 

{
    "ok" : 0,
    "errmsg" : "E11000 duplicate key error collection: config.system.sharding_ddl_coordinators index: _id_ dup key: { _id: { namespace: \"tdube04mipmed0.sessions\", operationType: \"renameCollection\" } }",
    "code" : 11000,
    "codeName" : "DuplicateKey",
}

* Shutdown SECONDARY member of a Non-Primary Shard: Not OK -> hangs forever, collection tdube04mipmed1.sessions gets corrupt, not usable anymore

  • Shutdown PRIMARY member of a Non-Primary Shard: Not OK -> hangs forever, collection tdube04mipmed1.sessions gets corrupt, not usable anymore

 

Kind Regards
Wernfried

 

 

 

 

Comment by Wernfried Domscheit [ 27/Apr/22 ]

Hi Eric
Yes, I removed writeConcern.wtimeout already from my production code. However, I am a bit confused. No matter if I specify writeConcern.wtimeout or not, the timeout was is 120 Seconds (I tested only with wtimeout:30000).

To me this behavior is fine, but I have no idea if it is intentional or not.

Kind Regards
Wernfried

Comment by Eric Sedor [ 25/Apr/22 ]

Hi wernfried.domscheit@sunrise.net, thanks for all the additional information, and your patience. Based on this, my colleagues have pointed me at both SERVER-60266 and SERVER-61416. In short, we recommend moving to 5.0.7 to obtain these two improvements, which we believe will address what you're seeing.

On this topic, we also wanted to recommend against specifying a writeConcern.wtimeoout for DDL operations in sharded clusters, to ensure that your code doesn't move on before a DDL operation has either completed or failed. Does this make sense?

Comment by Wernfried Domscheit [ 22/Apr/22 ]

Hi

I found a solution! You have to run the rename with w: 'majority' write concern:

db.adminCommand({ 
   "renameCollection": "tdube04mipmed1.sessions", 
   "to": "tdube04mipmed1.sessions.111-1255", 
   "writeConcern": { w: "majority" } 
})
 
{
    "ok": 0,
    "errmsg": "Failed to release sharding DDL coordinator :: caused by :: waiting for replication timed out",
    "code": 64,
    "codeName": "WriteConcernFailed",
    "$clusterTime": {
        "clusterTime": Timestamp(1650609613, 4),
        "signature": {
            "hash": BinData(0,"2PKlfbqhV49cuwKnNihScgthiis="),
            "keyId": NumberLong("7089111244539428886")
        }
    },
    "operationTime": Timestamp(1650609606, 1)
}

It returns a "proper" WriteConcernFailed error. After this error you can reconfigure replica set with { votes: 0, priority: 0 } and retry rename command

 

I don't know the MongoDB internals, perhaps you need to review the Implicit Default Write Concern algorithm.

Or the problem is at renameCollection

writeConcern:
A document that expresses the write concern for the operation. Omit to use the default write concern.
When issued on a sharded cluster, mongos converts the write concern of the renameCollection command and its helper db.collection.renameCollection() to "majority".

 

Kind Regards
Wernfried

Comment by Wernfried Domscheit [ 21/Apr/22 ]

Hi

OK, I think you we can scratch most from above. I found an easy way to reproduce the problem even in a lab environment:

 

Deploy a shared cluster with 3 shards (shard_01, shard_02, shard_03). Deploy each shards as a PSA (PRIMARY, SECONDARY, ARBITER)

 

 

db.adminCommand("getShardMap").map
{
    "shard_01" : "shard_01/localhost:27028,localhost:27128",
    "shard_02" : "shard_02/localhost:27038,localhost:27138",
    "shard_03" : "shard_03/localhost:27048,localhost:27148",
    "config" : "configRepSet/localhost:27029,localhost:27039,localhost:27049"
}
 
 
 
for (let d of ["tdube04mipmed1", "tdube04mipmed0", "pzur01mipmed0", "pzur01mipmed1", "pgen01mipmed0", "pgen01mipmed1"]) {
   db.getSiblingDB(d).runCommand({
      "createIndexes": "sessions",
      "commitQuorum": 1,
      "indexes": [
         { "name": "n_t_dp", "key": { "n": 1, "t": 1, "dp": 1 } },
         { "name": "si_tsi", "key": { "si": 1, "tsi": 1 } },
         { "name": "n_t0", "key": { "t0": 1, "n": 1 }, "partialFilterExpression": { "t0": { "$lt": ISODate("2022-04-17T22:00:00Z") } } }
      ]
   });
   sh.enableSharding(d);
}
 
 
db.adminCommand({ movePrimary: "tdube04mipmed0", to: "shard_01" })
db.adminCommand({ movePrimary: "tdube04mipmed1", to: "shard_01" })
db.adminCommand({ movePrimary: "pzur01mipmed0", to: "shard_02" })
db.adminCommand({ movePrimary: "pzur01mipmed1", to: "shard_02" })
db.adminCommand({ movePrimary: "pgen01mipmed0", to: "shard_03" })
db.adminCommand({ movePrimary: "pgen01mipmed1", to: "shard_03" })
 
 
db.getSiblingDB('tdube04mipmed1').sessions.insertMany([
   { "_id": ObjectId("62612fd54538aebab5a19bf9"), "sid": NumberLong("2545922668"), "di": "194.230.47.34", "tsp": 54913, "rb": NumberLong(4238), "tsi": "194.230.159.201", "dp": 443, "si": "10.133.216.49", "a": "client-rst", "f": "WALB01MFW01", "sb": NumberLong(1529), "t0": ISODate("2022-04-21T10:17:40Z"), "n": "snat", "p": "https", "t": ISODate("2022-04-21T10:19:03Z"), "sp": 54913 },
   { "_id": ObjectId("62612fd54538aebab5a19bfa"), "sid": NumberLong("2543830321"), "di": "35.186.224.25", "tsp": 64335, "rb": NumberLong(153309), "tsi": "194.230.159.177", "dp": 443, "si": "10.128.63.85", "a": "accept", "f": "WALB01MFW01", "sb": NumberLong(116), "t0": ISODate("2022-04-21T10:11:05Z"), "n": "snat", "p": "https", "t": ISODate("2022-04-21T10:19:03Z"), "sp": 64335 },
   { "_id": ObjectId("62612fd54538aebab5a19bfb"), "sid": NumberLong("2541614925"), "di": "194.230.47.17", "tsp": 49185, "rb": NumberLong(306330), "tsi": "194.230.159.181", "dp": 443, "si": "10.128.26.161", "a": "accept", "f": "WALB01MFW01", "sb": NumberLong(10003), "t0": ISODate("2022-04-21T10:04:11Z"), "n": "snat", "p": "udp", "t": ISODate("2022-04-21T10:19:03Z"), "sp": 49185 },
   { "_id": ObjectId("62612fd54538aebab5a19bfc"), "sid": NumberLong("2546180742"), "di": "216.58.215.246", "tsp": 43324, "rb": NumberLong(454914), "tsi": "194.230.159.193", "dp": 443, "si": "10.128.14.237", "a": "close", "f": "WALB01MFW01", "sb": NumberLong(22994), "t0": ISODate("2022-04-21T10:18:30Z"), "n": "snat", "p": "https", "t": ISODate("2022-04-21T10:19:03Z"), "sp": 43324 },
   { "_id": ObjectId("62612fd54538aebab5a19bfd"), "sid": NumberLong("2541615618"), "di": "35.244.237.205", "tsp": 65421, "rb": NumberLong(1991), "tsi": "194.230.159.149", "dp": 443, "si": "10.134.46.165", "a": "accept", "f": "WALB01MFW01", "sb": NumberLong(1532), "t0": ISODate("2022-04-21T10:04:12Z"), "n": "snat", "p": "udp", "t": ISODate("2022-04-21T10:19:04Z"), "sp": 65421 },
   { "_id": ObjectId("62612fd54538aebab5a19bfe"), "sid": NumberLong("2541621962"), "di": "216.58.215.234", "tsp": 40214, "rb": NumberLong(3765), "tsi": "194.230.159.221", "dp": 443, "si": "10.133.145.9", "a": "accept", "f": "WALB01MFW01", "sb": NumberLong(10587), "t0": ISODate("2022-04-21T10:04:12Z"), "n": "snat", "p": "udp", "t": ISODate("2022-04-21T10:19:03Z"), "sp": 40214 }
])
 
 
// Connect to SECONDARY of shard_01, shard_02, shard_03 -> db.shutdownServer()
// -> All SECONDARY shards services are down
 
db.adminCommand({ "renameCollection": "tdube04mipmed1.sessions", "to": "tdube04mipmed1.sessions.111-1255", "writeConcern": { "wtimeout": 30000 } })
// "renameCollection" Command is hanging. While it is waiting, open a new connection to ReplicaSet shard_01
 
cfg = db.adminCommand({ replSetGetConfig: 1 }).config;
idx = cfg.members.findIndex(x => x.host == "localhost:27128");
cfg.members[idx].votes = 0;
cfg.members[idx].priority = 0;
cfg.version++;
db.adminCommand({replSetReconfig: cfg});
 
 
// After some time (I guess after 30 seconds) you receive the error:
{
   "ok" : 0,
      "errmsg" : "Failed command { _shardsvrRenameCollectionParticipant: \"sessions\", to: \"tdube04mipmed1.sessions.111-1255\", dropTarget: false, stayTemp: false, sourceUUID: UUID(\"4a092739-5599-488e-957d-1f4f34ff8e63\"), writeConcern: { w: \"majority\", wtimeout: 60000 }, lsid: { id: UUID(\"50e68494-2c74-4929-a327-8e84b3b2ee6e\"), uid: BinData(0, BB89D3175FB0981C86826C27759082A37160471F60504B4612415186C630AB70) }, txnNumber: 0, $audit: { $impersonatedUsers: [ { user: \"admin\", db: \"admin\" } ], $impersonatedRoles: [ { role: \"root\", db: \"admin\" } ] } } for database 'tdube04mipmed1' on shard 'shard_03' :: caused by :: waiting for replication timed out",
         "code" : 64,
            "codeName" : "WriteConcernFailed",
               "$clusterTime" : {
      "clusterTime" : Timestamp(1650540422, 7),
         "signature" : {
         "hash" : BinData(0, "EQrdPCc8WYOCz7IKyHMdeI7Xwaw="),
            "keyId" : NumberLong("7089007576913805333")
      }
   },
   "operationTime" : Timestamp(1650540422, 1)
}
 
 

 

Now you have situation where db.getSiblingDB('tdube04mipmed1').sessions cannot be be used anymore. Any command you try to run on db.getSiblingDB('tdube04mipmed1').sessions just hangs.

 

Surprisingly db.getSiblingDB('tdube04mipmed1').getCollectionNames() returns sessions.111-1255 which should not be the case, because "renameCollection" actually failed.

Also it says "Failed command ... for database 'tdube04mipmed1' on shard 'shard_03'", however primary shard for database tdube04mipmed1 is "shard_01", not "shard_03". So I would expect the error message  "Failed command ... for database 'tdube04mipmed1' on shard 'shard_01'"

 

  • stop / start of all nodes in sharded cluster does not help
  • db.getSiblingDB('tdube04mipmed1').dropDatabase() does not help either

 

I don't find any solution to make "tdube04mipmed1.sessions" usable again, I have to drop the entire sharded cluster and deploy it from scratch!

 

The workaroud is to set {{{votes: 0, priority: 0} }} before you run "renameCollection". However, typically you don't know the time when a node in your cluster fails, it may happen at arbitrary time. If you have a network or power outage the it does not care whether "renameCollection" is currently executed or not.

 

Kind Regards
Wernfried

 

 

Comment by Wernfried Domscheit [ 20/Apr/22 ]

I run again into this problem. This time it was not by intention, I stopped/started "shard_03" and a rename failed. Note, primary shard for database "pzur01mipmed0" is not "shard_03" which was stopped:

Now, collection pzur01mipmed0.sessions#3 is not created anymore, i.e. I cannot insert any data into such collection:

 

db.adminCommand({  "renameCollection" : "pzur01mipmed0.sessions#3",  "to" : "pzur01mipmed0.sessions#3.110-1155",  "writeConcern" : {  "wtimeout" : 30000 } })
 
Failed command 
{
    _shardsvrRenameCollectionParticipant: "sessions#3",
    to: "pzur01mipmed0.sessions#3.110-1155",
    dropTarget: false,
    stayTemp: false,
    sourceUUID: UUID("ea976782-923e-4e14-9688-2d1317203230"),
    writeConcern: {
        w: "majority",
        wtimeout: 60000
    },
    lsid: {
        id: UUID("2d0ee188-7a78-459a-b1d4-a9fa95012162"),
        uid: BinData(0, BB89D3175FB0981C86826C27759082A37160471F60504B4612415186C630AB70)
    },
    txnNumber: 0,
    $audit: {
        $impersonatedUsers: [
            {
                user: "mipsys",
                db: "admin"
            }
        ],
        $impersonatedRoles: [
            {
                role: "dbOwner",
                db: "pgen01mipmed0"
            },
            {
                role: "dbOwner",
                db: "data"
            },
            {
                role: "dbOwner",
                db: "p-mipmed-as-02"
            },
            {
                role: "dbOwner",
                db: "plau01mipmed1"
            },
            {
                role: "dbOwner",
                db: "tdube04mipmed1"
            },
            {
                role: "dbOwner",
                db: "pgen01mipmed1"
            },
            {
                role: "clusterMonitor",
                db: "admin"
            },
            {
                role: "dbOwner",
                db: "p-mipmed-as-01"
            },
            {
                role: "dbOwner",
                db: "ignored"
            },
            {
                role: "dbOwner",
                db: "tdube04mipmed0"
            },
            {
                role: "dbOwner",
                db: "polt01mipmed1"
            },
            {
                role: "mip.admin",
                db: "admin"
            },
            {
                role: "dbOwner",
                db: "pzur01mipmed1"
            },
            {
                role: "dbOwner",
                db: "t-mipmed-as-01"
            },
            {
                role: "dbOwner",
                db: "mip"
            },
            {
                role: "dbOwner",
                db: "pmado01mipmed0"
            },
            {
                role: "dbOwner",
                db: "polt01mipmed0"
            },
            {
                role: "dbOwner",
                db: "pzur01mipmed0"
            },
            {
                role: "dbOwner",
                db: "pmado01mipmed1"
            },
            {
                role: "dbOwner",
                db: "pber08mipmed1"
            },
            {
                role: "clusterManager",
                db: "admin"
            },
            {
                role: "dbOwner",
                db: "plau01mipmed0"
            },
            {
                role: "dbOwner",
                db: "pber08mipmed0"
            }
        ]
    }
}
 
for database 'pzur01mipmed0' on shard 'shard_03' :: caused by :: waiting for replication timed out", 
 

Here are some logs related to this operation. I can also provide the full log files:

 

Logging on mongos:
 
{"t":{"$date":"2022-04-20T11:56:04.023+02:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn4059","msg":"Slow query","attr":{"type":"command","ns":"pzur01mipmed0.sessions#3","appName":"MongoDB Shell","command":{"renameCollection":"pzur01mipmed0.sessions#3","to":"pzur01mipmed0.sessions#3.110-1155","writeConcern":{"wtimeout":30000},"lsid":{"id":{"$uuid":"6d981744-2bba-4841-abb7-8b755eefa491"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1650448502,"i":34026}},"signature":{"hash":{"$binary":{"base64":"Va6o9Oz80dDKrfn/tFR3EBPf8jU=","subType":"0"}},"keyId":7081698715102085126}},"$db":"admin"},"numYields":0,"ok":0,"errMsg":"Failed command { _shardsvrRenameCollectionParticipant: \"sessions#3\", to: \"pzur01mipmed0.sessions#3.110-1155\", dropTarget: false, stayTemp: false, sourceUUID: UUID(\"ea976782-923e-4e14-9688-2d1317203230\"), writeConcern: { w: \"majority\", wtimeout: 60000 }, lsid: { id: UUID(\"2d0ee188-7a78-459a-b1d4-a9fa95012162\"), uid: BinData(0, BB89D3175FB0981C86826C27759082A37160471F60504B4612415186C630AB70) }, txnNumber: 0, $audit: { $impersonatedUsers: [ { user: \"mipsys\", db: \"admin\" } ], $impersonatedRoles: [ { role: \"dbOwner\", db: \"pgen01mipmed0\" }, { role: \"dbOwner\", db: \"data\" }, { role: \"dbOwner\", db: \"p-mipmed-as-02\" }, { role: \"dbOwner\", db: \"plau01mipmed1\" }, { role: \"dbOwner\", db: \"tdube04mipmed1\" }, { role: \"dbOwner\", db: \"pgen01mipmed1\" }, { role: \"clusterMonitor\", db: \"admin\" }, { role: \"dbOwner\", db: \"p-mipmed-as-01\" }, { role: \"dbOwner\", db: \"ignored\" }, { role: \"dbOwner\", db: \"tdube04mipmed0\" }, { role: \"dbOwner\", db: \"polt01mipmed1\" }, { role: \"mip.admin\", db: \"admin\" }, { role: \"dbOwner\", db: \"pzur01mipmed1\" }, { role: \"dbOwner\", db: \"t-mipmed-as-01\" }, { role: \"dbOwner\", db: \"mip\" }, { role: \"dbOwner\", db: \"pmado01mipmed0\" }, { role: \"dbOwner\", db: \"polt01mipmed0\" }, { role: \"dbOwner\", db: \"pzur01mipmed0\" }, { role: \"dbOwner\", db: \"pmado01mipmed1\" }, { role: \"dbOwner\", db: \"pber08mipmed1\" }, { role: \"clusterManager\", db: \"admin\" }, { role: \"dbOwner\", db: \"plau01mipmed0\" }, { role: \"dbOwner\", db: \"pber08mipmed0\" } ] } } for database 'pzur01mipmed0' on shard 'shard_03' :: caused by :: waiting for replication timed out","errName":"WriteConcernFailed","errCode":64,"reslen":1762,"readConcern":{"level":"local","provenance":"implicitDefault"},"remote":"10.229.152.20:38584","protocol":"op_msg","durationMillis":61119}}
 
 
Logging from SECONDARY Shard_04:
 
{"t":{"$date":"2022-04-20T11:55:03.813+02:00"},"s":"I",  "c":"COMMAND",  "id":20401,   "ctx":"ReplWriterWorker-8","msg":"renameCollectionForApplyOps","attr":{"sourceNamespace":"pzur01mipmed0.sessions#3","uuid":"ea976782-923e-4e14-9688-2d1317203230","targetNamespace":"pzur01mipmed0.sessions#3.110-1155","uuidToDrop":"<none>"}}
{"t":{"$date":"2022-04-20T11:55:03.813+02:00"},"s":"I",  "c":"STORAGE",  "id":20319,   "ctx":"ReplWriterWorker-8","msg":"renameCollection","attr":{"uuid":{"uuid":{"$uuid":"ea976782-923e-4e14-9688-2d1317203230"}},"fromName":"pzur01mipmed0.sessions#3","toName":"pzur01mipmed0.sessions#3.110-1155"}}
 
 
Logging from PRIMARY Shard_04:
 
{"t":{"$date":"2022-04-20T11:55:02.926+02:00"},"s":"I",  "c":"SHARDING", "id":22655,   "ctx":"ShardingDDLCoordinator-938","msg":"Acquired distributed lock","attr":{"lockName":"pzur01mipmed0.sessions#3.110-1155","lockSessionId":{"$oid":"020000000000000000000000"},"reason":"renameCollection"}}
{"t":{"$date":"2022-04-20T11:55:03.144+02:00"},"s":"I",  "c":"SHARDING", "id":22080,   "ctx":"ShardingDDLCoordinator-938","msg":"About to log metadata event","attr":{"namespace":"changelog","event":{"_id":"d-mipmdb-sh1-04.swi.srse.net:27018-2022-04-20T11:55:03.144+02:00-625fd877e594713dc7bacd9a","server":"d-mipmdb-sh1-04.swi.srse.net:27018","shard":"shard_04","clientAddr":"","time":{"$date":"2022-04-20T09:55:03.144Z"},"what":"renameCollection.start","ns":"pzur01mipmed0.sessions#3","details":{"source":"pzur01mipmed0.sessions#3","destination":"pzur01mipmed0.sessions#3.110-1155"}}}}
{"t":{"$date":"2022-04-20T11:55:03.797+02:00"},"s":"I",  "c":"COMMAND",  "id":20400,   "ctx":"RenameCollectionParticipantService-467","msg":"renameCollectionForCommand","attr":{"sourceNamespace":"pzur01mipmed0.sessions#3","targetNamespace":"pzur01mipmed0.sessions#3.110-1155","dropTarget":"no"}}
{"t":{"$date":"2022-04-20T11:55:03.797+02:00"},"s":"I",  "c":"STORAGE",  "id":20319,   "ctx":"RenameCollectionParticipantService-467","msg":"renameCollection","attr":{"uuid":{"uuid":{"$uuid":"ea976782-923e-4e14-9688-2d1317203230"}},"fromName":"pzur01mipmed0.sessions#3","toName":"pzur01mipmed0.sessions#3.110-1155"}}
{"t":{"$date":"2022-04-20T11:55:03.901+02:00"},"s":"I",  "c":"SHARDING", "id":5515106, "ctx":"RenameCollectionParticipantService-467","msg":"Collection locally renamed, waiting for CRUD to be unblocked","attr":{"fromNs":"pzur01mipmed0.sessions#3","toNs":"pzur01mipmed0.sessions#3.110-1155"}}
{"t":{"$date":"2022-04-20T11:55:03.901+02:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn8262","msg":"Slow query","attr":{"type":"command","ns":"pzur01mipmed0.$cmd","command":{"_shardsvrRenameCollectionParticipant":"sessions#3","to":"pzur01mipmed0.sessions#3.110-1155","dropTarget":false,"stayTemp":false,"sourceUUID":{"$uuid":"ea976782-923e-4e14-9688-2d1317203230"},"writeConcern":{"w":"majority","wtimeout":60000},"lsid":{"id":{"$uuid":"2d0ee188-7a78-459a-b1d4-a9fa95012162"},"uid":{"$binary":{"base64":"u4nTF1+wmByGgmwndZCCo3FgRx9gUEtGEkFRhsYwq3A=","subType":"0"}}},"txnNumber":0,"$audit":{"$impersonatedUsers":[{"user":"mipsys","db":"admin"}],"$impersonatedRoles":[{"role":"dbOwner","db":"pgen01mipmed0"},{"role":"dbOwner","db":"data"},{"role":"dbOwner","db":"p-mipmed-as-02"},{"role":"dbOwner","db":"plau01mipmed1"},{"role":"dbOwner","db":"tdube04mipmed1"},{"role":"dbOwner","db":"pgen01mipmed1"},{"role":"clusterMonitor","db":"admin"},{"role":"dbOwner","db":"p-mipmed-as-01"},{"role":"dbOwner","db":"ignored"},{"role":"dbOwner","db":"tdube04mipmed0"},{"role":"dbOwner","db":"polt01mipmed1"},{"role":"mip.admin","db":"admin"},{"role":"dbOwner","db":"pzur01mipmed1"},{"role":"dbOwner","db":"t-mipmed-as-01"},{"role":"dbOwner","db":"mip"},{"role":"dbOwner","db":"pmado01mipmed0"},{"role":"dbOwner","db":"polt01mipmed0"},{"role":"dbOwner","db":"pzur01mipmed0"},{"role":"dbOwner","db":"pmado01mipmed1"},{"role":"dbOwner","db":"pber08mipmed1"},{"role":"clusterManager","db":"admin"},{"role":"dbOwner","db":"plau01mipmed0"},{"role":"dbOwner","db":"pber08mipmed0"}]},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1650448503,"i":16004}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$configTime":{"$timestamp":{"t":1650448503,"i":5504}},"$topologyTime":{"$timestamp":{"t":1648836518,"i":2}},"$db":"pzur01mipmed0"},"numYields":0,"reslen":368,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":2}},"Global":{"acquireCount":{"r":2}},"Mutex":{"acquireCount":{"r":1}}},"readConcern":{"provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":60000,"provenance":"clientSupplied"},"storage":{},"remote":"10.192.241.232:49430","protocol":"op_msg","durationMillis":431}}
{"t":{"$date":"2022-04-20T11:56:03.982+02:00"},"s":"E",  "c":"SHARDING", "id":5460505, "ctx":"ShardingDDLCoordinator-938","msg":"Error running rename collection","attr":{"namespace":"pzur01mipmed0.sessions#3","error":"WriteConcernFailed: Failed command { _shardsvrRenameCollectionParticipant: \"sessions#3\", to: \"pzur01mipmed0.sessions#3.110-1155\", dropTarget: false, stayTemp: false, sourceUUID: UUID(\"ea976782-923e-4e14-9688-2d1317203230\"), writeConcern: { w: \"majority\", wtimeout: 60000 }, lsid: { id: UUID(\"2d0ee188-7a78-459a-b1d4-a9fa95012162\"), uid: BinData(0, BB89D3175FB0981C86826C27759082A37160471F60504B4612415186C630AB70) }, txnNumber: 0, $audit: { $impersonatedUsers: [ { user: \"mipsys\", db: \"admin\" } ], $impersonatedRoles: [ { role: \"dbOwner\", db: \"pgen01mipmed0\" }, { role: \"dbOwner\", db: \"data\" }, { role: \"dbOwner\", db: \"p-mipmed-as-02\" }, { role: \"dbOwner\", db: \"plau01mipmed1\" }, { role: \"dbOwner\", db: \"tdube04mipmed1\" }, { role: \"dbOwner\", db: \"pgen01mipmed1\" }, { role: \"clusterMonitor\", db: \"admin\" }, { role: \"dbOwner\", db: \"p-mipmed-as-01\" }, { role: \"dbOwner\", db: \"ignored\" }, { role: \"dbOwner\", db: \"tdube04mipmed0\" }, { role: \"dbOwner\", db: \"polt01mipmed1\" }, { role: \"mip.admin\", db: \"admin\" }, { role: \"dbOwner\", db: \"pzur01mipmed1\" }, { role: \"dbOwner\", db: \"t-mipmed-as-01\" }, { role: \"dbOwner\", db: \"mip\" }, { role: \"dbOwner\", db: \"pmado01mipmed0\" }, { role: \"dbOwner\", db: \"polt01mipmed0\" }, { role: \"dbOwner\", db: \"pzur01mipmed0\" }, { role: \"dbOwner\", db: \"pmado01mipmed1\" }, { role: \"dbOwner\", db: \"pber08mipmed1\" }, { role: \"clusterManager\", db: \"admin\" }, { role: \"dbOwner\", db: \"plau01mipmed0\" }, { role: \"dbOwner\", db: \"pber08mipmed0\" } ] } } for database 'pzur01mipmed0' on shard 'shard_03' :: caused by :: waiting for replication timed out"}}
{"t":{"$date":"2022-04-20T11:56:04.000+02:00"},"s":"I",  "c":"SHARDING", "id":22650,   "ctx":"replSetDistLockPinger","msg":"Unlocked distributed lock","attr":{"lockSessionId":{"$oid":"020000000000000000000000"},"lockName":"pzur01mipmed0.sessions#3.110-1155"}}
{"t":{"$date":"2022-04-20T11:56:04.022+02:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn8052","msg":"Slow query","attr":{"type":"command","ns":"pzur01mipmed0.$cmd","appName":"MongoDB Shell","command":{"_shardsvrRenameCollection":"sessions#3","to":"pzur01mipmed0.sessions#3.110-1155","dropTarget":false,"stayTemp":false,"databaseVersion":{"uuid":{"$uuid":"610ffe78-553e-4222-a903-679800bbc898"},"timestamp":{"$timestamp":{"t":1648837899,"i":2}},"lastMod":1},"writeConcern":{"w":"majority","wtimeout":60000},"lsid":{"id":{"$uuid":"6d981744-2bba-4841-abb7-8b755eefa491"},"uid":{"$binary":{"base64":"0+wVVtwDGKHvHj3sWREUt3H6wqvq/NkgyxQgk/ZyCQU=","subType":"0"}}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1650448502,"i":34026}},"signature":{"hash":{"$binary":{"base64":"Va6o9Oz80dDKrfn/tFR3EBPf8jU=","subType":"0"}},"keyId":7081698715102085126}},"$configTime":{"$timestamp":{"t":1650448502,"i":25023}},"$topologyTime":{"$timestamp":{"t":1648836518,"i":2}},"$audit":{"$impersonatedUsers":[{"user":"mipsys","db":"admin"}],"$impersonatedRoles":[{"role":"dbOwner","db":"pgen01mipmed0"},{"role":"dbOwner","db":"data"},{"role":"dbOwner","db":"p-mipmed-as-02"},{"role":"dbOwner","db":"plau01mipmed1"},{"role":"dbOwner","db":"tdube04mipmed1"},{"role":"dbOwner","db":"pgen01mipmed1"},{"role":"clusterMonitor","db":"admin"},{"role":"dbOwner","db":"p-mipmed-as-01"},{"role":"dbOwner","db":"ignored"},{"role":"dbOwner","db":"tdube04mipmed0"},{"role":"dbOwner","db":"polt01mipmed1"},{"role":"mip.admin","db":"admin"},{"role":"dbOwner","db":"pzur01mipmed1"},{"role":"dbOwner","db":"t-mipmed-as-01"},{"role":"dbOwner","db":"mip"},{"role":"dbOwner","db":"pmado01mipmed0"},{"role":"dbOwner","db":"polt01mipmed0"},{"role":"dbOwner","db":"pzur01mipmed0"},{"role":"dbOwner","db":"pmado01mipmed1"},{"role":"dbOwner","db":"pber08mipmed1"},{"role":"clusterManager","db":"admin"},{"role":"dbOwner","db":"plau01mipmed0"},{"role":"dbOwner","db":"pber08mipmed0"}]},"$client":{"application":{"name":"MongoDB Shell"},"driver":{"name":"MongoDB Internal Client","version":"5.0.3"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux Server release 7.9 (Maipo)","architecture":"x86_64","version":"Kernel 3.10.0-1160.2.2.el7.x86_64"},"mongos":{"host":"pzur01mipmed0.swi.srse.net:27017","client":"10.229.152.20:38584","version":"5.0.3"}},"$configServerState":{"opTime":{"ts":{"$timestamp":{"t":1650448502,"i":25023}},"t":-1}},"$db":"pzur01mipmed0"},"numYields":0,"ok":0,"errMsg":"Failed command { _shardsvrRenameCollectionParticipant: \"sessions#3\", to: \"pzur01mipmed0.sessions#3.110-1155\", dropTarget: false, stayTemp: false, sourceUUID: UUID(\"ea976782-923e-4e14-9688-2d1317203230\"), writeConcern: { w: \"majority\", wtimeout: 60000 }, lsid: { id: UUID(\"2d0ee188-7a78-459a-b1d4-a9fa95012162\"), uid: BinData(0, BB89D3175FB0981C86826C27759082A37160471F60504B4612415186C630AB70) }, txnNumber: 0, $audit: { $impersonatedUsers: [ { user: \"mipsys\", db: \"admin\" } ], $impersonatedRoles: [ { role: \"dbOwner\", db: \"pgen01mipmed0\" }, { role: \"dbOwner\", db: \"data\" }, { role: \"dbOwner\", db: \"p-mipmed-as-02\" }, { role: \"dbOwner\", db: \"plau01mipmed1\" }, { role: \"dbOwner\", db: \"tdube04mipmed1\" }, { role: \"dbOwner\", db: \"pgen01mipmed1\" }, { role: \"clusterMonitor\", db: \"admin\" }, { role: \"dbOwner\", db: \"p-mipmed-as-01\" }, { role: \"dbOwner\", db: \"ignored\" }, { role: \"dbOwner\", db: \"tdube04mipmed0\" }, { role: \"dbOwner\", db: \"polt01mipmed1\" }, { role: \"mip.admin\", db: \"admin\" }, { role: \"dbOwner\", db: \"pzur01mipmed1\" }, { role: \"dbOwner\", db: \"t-mipmed-as-01\" }, { role: \"dbOwner\", db: \"mip\" }, { role: \"dbOwner\", db: \"pmado01mipmed0\" }, { role: \"dbOwner\", db: \"polt01mipmed0\" }, { role: \"dbOwner\", db: \"pzur01mipmed0\" }, { role: \"dbOwner\", db: \"pmado01mipmed1\" }, { role: \"dbOwner\", db: \"pber08mipmed1\" }, { role: \"clusterManager\", db: \"admin\" }, { role: \"dbOwner\", db: \"plau01mipmed0\" }, { role: \"dbOwner\", db: \"pber08mipmed0\" } ] } } for database 'pzur01mipmed0' on shard 'shard_03' :: caused by :: waiting for replication timed out","errName":"WriteConcernFailed","errCode":64,"reslen":1956,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"r":1}},"Database":{"acquireCount":{"r":1}},"Mutex":{"acquireCount":{"r":2}}},"writeConcern":{"w":"majority","wtimeout":60000,"provenance":"clientSupplied"},"remote":"10.229.152.20:53874","protocol":"op_msg","durationMillis":61116}}
 
 
Logging from PRIMARY Shard_03:
 
{"t":{"$date":"2022-04-20T11:56:03.981+02:00"},"s":"E",  "c":"SHARDING", "id":5515109, "ctx":"RenameCollectionParticipantService-470","msg":"Error executing rename collection participant","attr":{"fromNs":"pzur01mipmed0.sessions#3","toNs":"pzur01mipmed0.sessions#3.110-1155","error":"WriteConcernFailed: waiting for replication timed out"}}
{"t":{"$date":"2022-04-20T11:56:03.981+02:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn5503","msg":"Slow query","attr":{"type":"command","ns":"pzur01mipmed0.$cmd","command":{"_shardsvrRenameCollectionParticipant":"sessions#3","to":"pzur01mipmed0.sessions#3.110-1155","dropTarget":false,"stayTemp":false,"sourceUUID":{"$uuid":"ea976782-923e-4e14-9688-2d1317203230"},"writeConcern":{"w":"majority","wtimeout":60000},"lsid":{"id":{"$uuid":"2d0ee188-7a78-459a-b1d4-a9fa95012162"},"uid":{"$binary":{"base64":"u4nTF1+wmByGgmwndZCCo3FgRx9gUEtGEkFRhsYwq3A=","subType":"0"}}},"txnNumber":0,"$audit":{"$impersonatedUsers":[{"user":"mipsys","db":"admin"}],"$impersonatedRoles":[{"role":"dbOwner","db":"pgen01mipmed0"},{"role":"dbOwner","db":"data"},{"role":"dbOwner","db":"p-mipmed-as-02"},{"role":"dbOwner","db":"plau01mipmed1"},{"role":"dbOwner","db":"tdube04mipmed1"},{"role":"dbOwner","db":"pgen01mipmed1"},{"role":"clusterMonitor","db":"admin"},{"role":"dbOwner","db":"p-mipmed-as-01"},{"role":"dbOwner","db":"ignored"},{"role":"dbOwner","db":"tdube04mipmed0"},{"role":"dbOwner","db":"polt01mipmed1"},{"role":"mip.admin","db":"admin"},{"role":"dbOwner","db":"pzur01mipmed1"},{"role":"dbOwner","db":"t-mipmed-as-01"},{"role":"dbOwner","db":"mip"},{"role":"dbOwner","db":"pmado01mipmed0"},{"role":"dbOwner","db":"polt01mipmed0"},{"role":"dbOwner","db":"pzur01mipmed0"},{"role":"dbOwner","db":"pmado01mipmed1"},{"role":"dbOwner","db":"pber08mipmed1"},{"role":"clusterManager","db":"admin"},{"role":"dbOwner","db":"plau01mipmed0"},{"role":"dbOwner","db":"pber08mipmed0"}]},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1650448503,"i":16004}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$configTime":{"$timestamp":{"t":1650448503,"i":5504}},"$topologyTime":{"$timestamp":{"t":1648836518,"i":2}},"$db":"pzur01mipmed0"},"numYields":0,"ok":0,"errMsg":"waiting for replication timed out","errName":"WriteConcernFailed","errCode":64,"reslen":466,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":2}},"Global":{"acquireCount":{"r":2}},"Mutex":{"acquireCount":{"r":1}}},"writeConcern":{"w":"majority","wtimeout":60000,"provenance":"clientSupplied"},"storage":{},"remote":"10.192.241.232:56702","protocol":"op_msg","durationMillis":60621}}
 
 

Kind Regards
Wernfried

Comment by Wernfried Domscheit [ 12/Apr/22 ]

Hi

Please see my comment:

 

  1. Begin inserting to a multiple non-sharded collections
  2. Stop a secondary of a 3-node replica set
  3. multiple times in parallel every 5 minutes:
    1. If a secondary is unavailable, run replSetReconfig to assign the unavailable member {votes: 0, priority: 0}
    2. If a secondary with {votes: 0, priority: 0}
      is available, run replSetReconfig to assign it
  • {votes: 1, priority: 1}
    -

Meanwhile I removed point 3.2 from job. My plan is to run this manually as part of the recovery job, i.e run it after entire cluster runs normal again.

Maybe it is better if I provide the relevant command directly:

db.adminCommand({  "renameCollection" : "pmado01mipmed1.sessions#1",  "to" : "pmado01mipmed1.sessions#1.102-1820",  "writeConcern" : {  "wtimeout" : 30000 } })
 
// New collection 'pmado01mipmed1.sessions#1' may exist already, because application constantly inserts data into it
db.getSiblingDB('pmado01mipmed1').runCommand({  "createIndexes" : "sessions#1"
   "indexes" : [ 
      { "name" : "n_t_dp", "key" : { "n" : 1, "t" : 1, "dp" : 1 } }, 
      { "name" : "si_tsi", "key" : { "si" : 1, "tsi" : 1 } }, 
      { "name" : "n_t0", "key" : { "t0" : 1, "n" : 1 }, "partialFilterExpression" : { "t0" : { "$lt" : ISODate("2022-04-08T22:00:00Z") } } } 
   ] 
})
 
db.getSiblingDB('pmado01mipmed1').getCollection('sessions#1.102-1820').aggregate([ 
   { <some stuff to get a clear transition from 'sessions#1.102-1820' to 'sessions#1' > }, 
   { "$merge" : {  "into" : {  "db" : "pmado01mipmed1",  "coll" : "sessions#1" } } }
], { "allowDiskUse" : true })
 
db.getSiblingDB('pmado01mipmed1').getCollection('sessions#1.102-1820').aggregate([ 
   { <many stuff, mainly $group > }, 
   { "$merge" : {  "into" : {  "db" : "data",  "coll" : "sessions.20220412" } } }
], { "allowDiskUse" : true })
 
db.getSiblingDB('pmado01mipmed1').getCollection('sessions#1.102-1820').aggregate([ 
   { <some more stuff, mainly $group> }, 
   { "$merge" : {  "into" : {  "db" : "data",  "coll" : "sessions.20220412" } } }
], { "allowDiskUse" : true })
 
db.getSiblingDB('pmado01mipmed1').getCollection('sessions#1.102-1820').drop({  "writeConcern" : {  "w" : 0,  "wtimeout" : 10000 } })
 

"pmado01mipmed1.sessions#1" is not sharded
"data.sessions.20220412" is sharded

 

I have 17 databases for raw data and 4 shards. So in average each shard covers 4-5 databases:

    "_id" : "mip", 
    "primary" : "shard_01", 
    "partitioned" : true
}
    "_id" : "data", 
    "primary" : "shard_04", 
    "partitioned" : true
}
    "_id" : "ignored", 
    "primary" : "shard_04", 
    "partitioned" : true
}
    "_id" : "tdube04mipmed0", 
    "primary" : "shard_02", 
    "partitioned" : true
}
    "_id" : "tdube04mipmed1", 
    "primary" : "shard_02", 
    "partitioned" : true
}
    "_id" : "t-mipmed-as-01", 
    "primary" : "shard_02", 
    "partitioned" : true
}
    "_id" : "pmado01mipmed0", 
    "primary" : "shard_04", 
    "partitioned" : true
}
    "_id" : "pmado01mipmed1", 
    "primary" : "shard_04", 
    "partitioned" : true
}
    "_id" : "pber08mipmed0", 
    "primary" : "shard_02", 
    "partitioned" : true
}
    "_id" : "pber08mipmed1", 
    "primary" : "shard_02", 
    "partitioned" : true
}
    "_id" : "polt01mipmed1", 
    "primary" : "shard_01", 
    "partitioned" : true
}
    "_id" : "polt01mipmed0", 
    "primary" : "shard_01", 
    "partitioned" : true
}
    "_id" : "pgen01mipmed0", 
    "primary" : "shard_02", 
    "partitioned" : true
}
    "_id" : "pgen01mipmed1", 
    "primary" : "shard_02", 
    "partitioned" : true
}
    "_id" : "pzur01mipmed1", 
    "primary" : "shard_04", 
    "partitioned" : true
}
    "_id" : "pzur01mipmed0", 
    "primary" : "shard_04", 
    "partitioned" : true
}
    "_id" : "p-mipmed-as-01", 
    "primary" : "shard_01", 
    "partitioned" : true
}
    "_id" : "p-mipmed-as-02", 
    "primary" : "shard_01", 
    "partitioned" : true
}
    "_id" : "plau01mipmed0", 
    "primary" : "shard_03", 
    "partitioned" : true
}
    "_id" : "plau01mipmed1", 
    "primary" : "shard_03", 
    "partitioned" : true
}

In the next days I will try to reproduce the error, then I can also provide logs

 

Best Regards
Wernfried

 

Comment by Eric Sedor [ 12/Apr/22 ]

Thanks Wernfried. So this is 5.0.3 and for unsharded collections only. So this would not be a form of SERVER-17397, sorry.

It sounds like a full set of reproduction steps would be:

  1. Begin inserting to a multiple non-sharded collections
  2. Stop a secondary of a 3-node replica set
  3. multiple times in parallel every 5 minutes:
    1. If a secondary is unavailable, run replSetReconfig to assign the unavailable member {votes: 0, priority: 0}
    2. If a secondary with {votes: 0, priority: 0}

      is available, run replSetReconfig to assign it

      {votes: 1, priority: 1}
    3. Rename the collection
    4. Create new indexes on the original collection
    5. Perform some reads on the original collection
    6. Perform inserts to a sharded collection
    7. Drop the original collection
  4. Later, for some collections that have been operated on by these jobs:
    1. Attempt to create a new collection with the same name as the original collection. This apparently succeeds?
    2. Attempt to read or write to the new collection. This will hang indefinitely in the mongo shell.

I suspect some of these steps are unnecessary but I assume because this is difficult to reproduce it could relate to race conditions between jobs. So delays or parallel operation may be needed at various points. A few more questions.

  • Can you confirm the above steps and provide any detail I missed?
  • Does each parallel job perform its own replica set reconfis?
  • Can you confirm this is only within one database?
Comment by Wernfried Domscheit [ 12/Apr/22 ]

Hi

Maybe I mistake the version:

$ mongod --version
db version v5.0.3
Build Info: {
    "version": "5.0.3",
    "gitVersion": "657fea5a61a74d7a79df7aff8e4bcf0bc742b748",
    "openSSLVersion": "OpenSSL 1.0.1e-fips 11 Feb 2013",
    "modules": [],
    "allocator": "tcmalloc",
    "environment": {
        "distmod": "rhel70",
        "distarch": "x86_64",
        "target_arch": "x86_64"
    }
}

I don't get any error message. If I attempt to re-use a collection, then mongo shell just hangs.

Basically I simulate the outage:

  • stop mongod process on secondary
  • the next time (latest after 5 Minutes) a bucketing job starts, then the jobs sets {votes: 0, priority: 0} for the stopped secondary
  • typically I wait 1-2 hours
  • start mongod process again
  • the next time (latest after 5 Minutes) a bucketing job starts, then the jobs revert to  {votes: 1, priority: 10} for secondary

Note, the bucketing jobs runs every hour, however in total there are around 60 parallel bucketing jobs. They don't run all at the same time, every 5 minutes  3-5 bucketing jobs start in order to have a better load distribution.

Kind Regards
Wernfried

 

 

 

 

 

Comment by Eric Sedor [ 12/Apr/22 ]

Hi Wernfried,

This seems similar to SERVER-17397 but I see you've reported 5.3.0-rc1. Can you:

  • Clarify on what versions you've seen this issue and when (that is was the first of these two times on upgrading to 5.3.1-rc1?, etc...)
  • Provide the specific errors occurring when you attempt to "re-use" the dropped collection name
  • Provide a more detailed timeline of what you do when a secondary member goes down. When you say "The member is reconfigured as passive and application continues to work. But when the member recovers again, then sometimes the database get corrupted" it makes it sound like you have some form of automated response here?

Thanks,
Eric

Generated at Thu Feb 08 06:02:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.