[SERVER-45119] CollectionShardingState::getCurrentShardVersionIfKnown returns collection version instead of shard version Created: 12/Dec/19  Updated: 29/Oct/23  Resolved: 06/Apr/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.2.5, 4.0.17
Fix Version/s: 4.2.6, 4.0.18

Type: Bug Priority: Critical - P2
Reporter: Matthew Saltz (Inactive) Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 1
Labels: KP42, regression, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File SERVER-45119 - Repro.js    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-47432 Mongo Server error (MongoQueryExcept... Closed
Related
is related to SERVER-53338 The best method resolving BUG SERVER... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2019-12-30, Sharding 2020-04-20
Participants:
Case:

 Description   
Issue Status as of Jul 31, 2020

ISSUE DESCRIPTION AND IMPACT

A bug in shard version checking causes a race condition between parallel chunk migrations and auto-split activity.

If the race condition occurs, an affected shard becomes unable to update its sharding metadata, and operations that require data from that shard will fail.

While it is possible for the issue to clear on its own, it is likely to persist until action is taken.

DIAGNOSIS AND AFFECTED VERSIONS

Sharded clusters with 2 or more shards running MongoDB versions <=4.2.5 and version 4.0.17 are impacted. The bug is much more likely to be triggered on 4.0.17 than other versions, however.

If the bug is triggered, client operations will begin failing with "version mismatch detected" (StaleConfig) errors. And, corresponding mongos logs will include "requested shard version differs from config shard version" error messages.

REMEDIATION AND WORKAROUNDS

If running MongoDB version 4.0.17, downgrade to 4.0.16 or upgrade to 4.0.18 when it becomes available.

If running MongoDB version 4.2.5, upgrade to version 4.2.6 when it becomes available.

In the event a version change is not possible, this issue can be partially mitigated by:

  • Disabling the balancer
  • Waiting for the balancer to stop running.
  • Running the following command on the primary replica set member of each shard:

db.adminCommand({_flushRoutingTableCacheUpdates: ns, syncFromConfig: true})

If you re-enable the balancer, the bug can be triggered again.

Note: If the sharded cluster is running with authentication enabled, you would need to grant the internal action on the cluster resource, to run the _flushRoutingTableCacheUpdates command:

You could create a new role with the internal privilege on the cluster resource, and then grant this role to the admin user as below. Replace ADMIN_USER with the username for the admin.

use admin;
db.createRole({
  role: "flush_routing_table_cache_updates",
  privileges: [
     { resource: { cluster: true }, actions: [ "internal" ] },
  ],
  roles: [  ]
});
 
db.grantRolesToUser("ADMIN_USER", ["flush_routing_table_cache_updates"])

FIX VERSIONS

4.2.6 and 4.0.18

original description

This should call getShardVersion() instead of getCollVersion(). It's only usage is here. Fortunately the check here is still valid even though we were returning the collection version. Basically if a shard knows about collection version X, and shard version Y, then it's not possible for the actual shard version to be between X and Y, because otherwise it would know about it.



 Comments   
Comment by Linda Qin [ 03/Jul/20 ]

Updating the user summary after the discussion with kelsey.schubert. There are two updates:

  1. Added the privilege required to run _flushRoutingTableCacheUpdates if the sharded cluster is running with authentication enabled:

    Note: If authentication is enabled, you would need to grant the internal action on the cluster resource, to run the _flushRoutingTableCacheUpdates command:

    You could create a new role with the internal privilege on the cluster resource, and then grant this role to the admin user as below. Replace ADMIN_USER with the username for the admin.

    use admin;
    db.createRole({
      role: "flush_routing_table_cache_updates",
      privileges: [
         { resource: { cluster: true }, actions: [ "internal" ] },
      ],
      roles: [  ]
    });
     
    db.grantRolesToUser("ADMIN_USER", ["flush_routing_table_cache_updates"])
    

  2. A shard cluster with only 2/3 shards could also hit this issue under some situations. As such, changed from "4 or more” to “2 or more", and removed "Sharded clusters with fewer than 4 shards are not affected because parallel chunk migrations do not occur in those clusters."

Here are some scenarios when a sharded cluster with 2 or 3 shards might also be affected by this:

  • Scenario 1:
    • There are 2+ shards in the cluster.
    • The cluster is upgraded from 3.6 to 4.0.17.
    • Then FCV is upgraded to 4.0. After this, all the chunk versions are increased - assume that we have only two chunks with chunk version (4|0) and (4|1), after FCV is upgraded to 4.0, the chunk version for them will be increased to (5|0) and (6|0).
    • A scatter-gather query is run on the sharded collection. So that the metadata is refreshed on the mongos & shards.
    • Then a split happens on this sharded collection, on the shard with smaller shard version (shardX).
    • After this, run a query on this sharded collection (that will hit the above shardX) from the same mongos (the mongos that splits the chunk), the query will fail with "version mismatch detected".
  • Scenario 2:
    • There are 3+ shards in the cluster.
    • The cluster is on 4.0.17.
    • Move a chunk migration in this sharded collection. After that two shards will have the highest major version.
    • Run a scatter-gather query on this sharded collection. So that the metadata is refreshed on the mongos & shards.
    • Then a split happens on this sharded collection, on the shard with smaller shard version (shardX).
    • After this, run a query on this sharded collection (that will hit the above shardX) from the same mongos (the mongos that splits the chunk), the query will fail with "version mismatch detected".

Repro:

Scenario 1

  1. Start a sharded cluster with 2 shards on 3.6.18, and shard a collection with shard key {x:1}
  2. Split a chunk in this collection and wait for the balancer to move it.
  3. After that, the chunk version are:

            {  "_id" : "test",  "primary" : "shard01",  "partitioned" : true }
                    test.c
                            shard key: { "x" : 1 }
                            unique: false
                            balancing: true
                            chunks:
                                    shard01	1
                                    shard02	1
                            { "x" : { "$minKey" : 1 } } -->> { "x" : 100 } on : shard02 Timestamp(3, 0)
                            { "x" : 100 } -->> { "x" : { "$maxKey" : 1 } } on : shard01 Timestamp(3, 1)
    

  • Upgrade the cluster to 4.0.17 and upgrade FCV to 4.0.

    mongos> db.adminCommand( { setFeatureCompatibilityVersion: "4.0" } )
    {
    	"ok" : 1,
    	"operationTime" : Timestamp(1593742199, 9),
    	"$clusterTime" : {
    		"clusterTime" : Timestamp(1593742199, 9),
    		"signature" : {
    			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
    			"keyId" : NumberLong(0)
    		}
    	}
    }
    

  • The chunk versions after the FCV upgrade are as below. The shard version for shard01 is (5, 0), for shard02 is (4, 0).

            {  "_id" : "test",  "primary" : "shard01",  "partitioned" : true,  "version" : {  "uuid" : UUID("2a7a0e85-079a-4f16-92b2-0ecebcb4a701"),  "lastMod" : 1 } }
                    test.c
                            shard key: { "x" : 1 }
                            unique: false
                            balancing: true
                            chunks:
                                    shard01	1
                                    shard02	1
                            { "x" : { "$minKey" : 1 } } -->> { "x" : 100 } on : shard02 Timestamp(4, 0)
                            { "x" : 100 } -->> { "x" : { "$maxKey" : 1 } } on : shard01 Timestamp(5, 0)
    

  • Run a scatter-gather query - to refresh the metadata.

    mongos> db.c.find()
    

  • Split a chunk on the same mongos, on a shard with the smaller shard version (shard02, chunk [MinKey, 100))

    mongos> sh.splitAt("test.c", {x:50})
    {
    	"ok" : 1,
    	"operationTime" : Timestamp(1593742216, 8),
    	"$clusterTime" : {
    		"clusterTime" : Timestamp(1593742234, 4),
    		"signature" : {
    			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
    			"keyId" : NumberLong(0)
    		}
    	}
    }
    

  • After the split, the chunk version are:

            {  "_id" : "test",  "primary" : "shard01",  "partitioned" : true,  "version" : {  "uuid" : UUID("2a7a0e85-079a-4f16-92b2-0ecebcb4a701"),  "lastMod" : 1 } }
                    test.c
                            shard key: { "x" : 1 }
                            unique: false
                            balancing: true
                            chunks:
                                    shard01	1
                                    shard02	2
                            { "x" : { "$minKey" : 1 } } -->> { "x" : 50 } on : shard02 Timestamp(5, 1)
                            { "x" : 50 } -->> { "x" : 100 } on : shard02 Timestamp(5, 2)
                            { "x" : 100 } -->> { "x" : { "$maxKey" : 1 } } on : shard01 Timestamp(5, 0)
    

  • Run a query on the same mongos, and it fails with "version mismatch detected"

    mongos> db.c.find()
    Error: stale config in runCommand :: caused by :: Failed to run query after 10 retries :: caused by :: version mismatch detected for test.c
    

  • The logs for the shard that split the chunk (shard02) has the following error msg:

    2020-07-03T12:10:37.437+1000 W SHARDING [conn21] requested shard version differs from config shard version for test.c, requested version is 5|2||5efe9344764fe6daba20940f but found version 4|0||5efe9344764fe6daba20940f
    

Scenario 2

  • Start a sharded cluster with 3 shards on 4.0.17, and shard a collection.
  • Split two chunk in this collection and wait for the balancer to move them.
  • After that, the chunk versions are:

            {  "_id" : "test",  "primary" : "shard01",  "partitioned" : true,  "version" : {  "uuid" : UUID("1e4a2002-01f0-4277-812a-dda2cff6c703"),  "lastMod" : 1 } }
                    test.c
                            shard key: { "x" : 1 }
                            unique: false
                            balancing: true
                            chunks:
                                    shard01	1
                                    shard02	1
                                    shard03	1
                            { "x" : { "$minKey" : 1 } } -->> { "x" : 100 } on : shard02 Timestamp(4, 0)
                            { "x" : 100 } -->> { "x" : 200 } on : shard03 Timestamp(5, 0)
                            { "x" : 200 } -->> { "x" : { "$maxKey" : 1 } } on : shard01 Timestamp(5, 1)
    

  • Run a scatter-gather query - to refresh the metadata.

    mongos> db.c.find()
    

  • Split a chunk on the same mongos, on a shard with the smaller shard version (shard02, chunk [MinKey, 100))

    mongos> sh.splitAt("test.c", {x:50})
    {
    	"ok" : 1,
    	"operationTime" : Timestamp(1593748149, 8),
    	"$clusterTime" : {
    		"clusterTime" : Timestamp(1593748155, 4),
    		"signature" : {
    			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
    			"keyId" : NumberLong(0)
    		}
    	}
    }
    

  • After the split, the chunk versions are:

            {  "_id" : "test",  "primary" : "shard01",  "partitioned" : true,  "version" : {  "uuid" : UUID("1e4a2002-01f0-4277-812a-dda2cff6c703"),  "lastMod" : 1 } }
                    test.c
                            shard key: { "x" : 1 }
                            unique: false
                            balancing: true
                            chunks:
                                    shard01	1
                                    shard02	2
                                    shard03	1
                            { "x" : { "$minKey" : 1 } } -->> { "x" : 50 } on : shard02 Timestamp(5, 2)
                            { "x" : 50 } -->> { "x" : 100 } on : shard02 Timestamp(5, 3)
                            { "x" : 100 } -->> { "x" : 200 } on : shard03 Timestamp(5, 0)
                            { "x" : 200 } -->> { "x" : { "$maxKey" : 1 } } on : shard01 Timestamp(5, 1)
    

  • Run a query on the same mongos, and it fails with "version mismatch detected"

    mongos> db.c.find()
    Error: stale config in runCommand :: caused by :: Failed to run query after 10 retries :: caused by :: version mismatch detected for test.c
    

  • The logs for the shard that split the chunk (shard02) has the following error msg:

    2020-07-03T13:49:17.056+1000 W SHARDING [conn30] requested shard version differs from config shard version for test.c, requested version is 5|3||5efeaa851fab56396b140335 but found version 4|0||5efeaa851fab56396b140335
    

Comment by Githook User [ 06/Apr/20 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-45119 Fix CollectionShardingState::getCurrentShardVersionIfKnown to actually return the shard version
Branch: v4.2
https://github.com/mongodb/mongo/commit/cc2f60792be600cf0bec65731a27cb6f7fcf42b4

Comment by Githook User [ 06/Apr/20 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-45119 Fix CollectionShardingState::getCurrentShardVersionIfKnown to actually return the shard version
Branch: v4.0
https://github.com/mongodb/mongo/commit/a937edbea8bf847f9b77387eac6548f8b830240b

Comment by Kaloian Manassiev [ 05/Apr/20 ]

Attaching the SERVER-45119 - Repro.js script, which reproduces this problem as of the currently latest commit on the 4.0 branch.

Comment by Kaloian Manassiev [ 25/Mar/20 ]

I updated the "Affects Version/s" field to indicate that it's not present in 4.4 and later, but I think we should fix this and backport it all the way to the versions in which it is present. While I understand that there is some intricate logic through which it doesn't cause infinite refreshes, it is too brittle of an assumption to make and we can easily break it with some backport we do.

matthew.saltz, would you be able to do the backports while we still have our attention on stabilisation? I don't want it to go on the backlog and drag it forever.

Generated at Thu Feb 08 05:07:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.