[SERVER-22310] movePrimary command should refresh existing mongos metadata caches Created: 26/Jan/16  Updated: 06/Dec/22  Resolved: 12/Nov/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ankit Kakkar Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Done Votes: 0
Labels: pm-1051-legacy-tickets
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Sharding
Participants:

 Description   
Original description

The movePrimary command is not refreshing existing mongos metadata caches resulting in inconsistent query runs.

Steps to Reproduce
  1. Deployed 2 shard cluster with each shard running a replica set of 3 nodes. I also started 2 mongos.
  2. Created a database mydb on mongos1 and this is what sh.status returned:

    --- Sharding Status --- 
      sharding version: {
        "_id": 1,
        "minCompatibleVersion": 5,
        "currentVersion": 6,
        "clusterId": ObjectId("56a7671754ad7a545f803203")
      }
      shards:
        {  "_id": "shard01",  "host": "shard01/ankit:27018,ankit:27019,ankit:27020" }
        {  "_id": "shard02",  "host": "shard02/ankit:27021,ankit:27022,ankit:27023" }
      balancer:
    	Currently enabled:  yes
    	Currently running:  no
    	Failed balancer rounds in last 5 attempts:  0
    	Migration Results for the last 24 hours: 
    		No recent migrations
      databases:
        {  "_id": "admin",  "partitioned": false,  "primary": "config" }
        {  "_id": "test",  "partitioned": false,  "primary": "shard01" }
        {  "_id": "mydb",  "partitioned": false,  "primary": "shard01" }
    

  3. Inserted a sample document:

    ankit(mongos-3.0.8)[mongos] mydb> db.mycoll.find()
    {
      "_id": ObjectId("56a767ef88eb8c1294029482")
    }
    

  4. Explain output from mongos1 (Notice the request going to shard1)

    ankit(mongos-3.0.8)[mongos] mydb1> db.mycoll1.find().explain()
    {
      "queryPlanner": {
        "mongosPlannerVersion": 1,
        "winningPlan": {
          "stage": "SINGLE_SHARD",
          "shards": [
            {
              "shardName": "shard01",
              "connectionString": "shard01/ankit:27018,ankit:27019,ankit:27020",
              "serverInfo": {
                "host": "ankit",
                "port": 27018,
                "version": "3.0.8",
                "gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d"
              },
              "plannerVersion": 1,
              "namespace": "mydb.mycoll",
    

  5. Explain output from mongos2 (The request correctly going to shard1)

    ankit:27025(mongos-3.0.8)[mongos] mydb> db.mycoll.find().explain()
    {
      "queryPlanner": {
        "mongosPlannerVersion": 1,
        "winningPlan": {
          "stage": "SINGLE_SHARD",
          "shards": [
            {
              "shardName": "shard01",
              "connectionString": "shard01/ankit:27018,ankit:27019,ankit:27020",
              "serverInfo": {
                "host": "ankit",
                "port": 27018,
                "version": "3.0.8",
                "gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d"
              },
              "plannerVersion": 1,
              "namespace": "mydb.mycoll",
    

  6. Used the movePrimary command to move mydb to shard2.
  7. sh.status output from mongos1

      databases:
        {  "_id": "admin",  "partitioned": false,  "primary": "config" }
        {  "_id": "test",  "partitioned": false,  "primary": "shard01" }
        {  "_id": "mydb",  "partitioned": false,  "primary": "shard02" }
    

  8. sh.status output from mongos2

      databases:
        {  "_id": "admin",  "partitioned": false,  "primary": "config" }
        {  "_id": "test",  "partitioned": false,  "primary": "shard01" }
        {  "_id": "mydb",  "partitioned": false,  "primary": "shard02" }
    

  9. Explain output from mongos1 (Request correctly going to shard2 now)

    nkit(mongos-3.0.8)[mongos] mydb> db.mycoll.find().explain()
    {
      "queryPlanner": {
        "mongosPlannerVersion": 1,
        "winningPlan": {
          "stage": "SINGLE_SHARD",
          "shards": [
            {
              "shardName": "shard02",
              "connectionString": "shard02/ankit:27021,ankit:27022,ankit:27023",
              "serverInfo": {
                "host": "ankit",
                "port": 27021,
                "version": "3.0.8",
                "gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d"
              },
              "plannerVersion": 1,
              "namespace": "mydb.mycoll",
    

  10. Explain output from mongos2 (Request STILL going to shard1)

    ankit:27025(mongos-3.0.8)[mongos] mydb> db.mycoll.find().explain()
    {
      "queryPlanner": {
        "mongosPlannerVersion": 1,
        "winningPlan": {
          "stage": "SINGLE_SHARD",
          "shards": [
            {
              "shardName": "shard01",
              "connectionString": "shard01/ankit:27018,ankit:27019,ankit:27020",
              "serverInfo": {
                "host": "ankit",
                "port": 27018,
                "version": "3.0.8",
                "gitVersion": "83d8cc25e00e42856924d84e220fbe4a839e605d"
              },
              "plannerVersion": 1,
              "namespace": "mydb.mycoll",
    

    Also, note that no results were returned from mongos2 find()

  11. Further, I manually created mydb database (and inserted a sample document) directly on mongod of shard1. This emulated the case where 2 databases with same name were sitting on different shards i.e. show dbs returning mydb on both the shards.

    ankit:27018(mongod-3.0.8)[PRIMARY:shard01] mydb1> use mydb
    switched to db mydb
    ankit:27018(mongod-3.0.8)[PRIMARY:shard01] mydb> db.mycoll.insert({})
    Inserted 1 record(s) in 139ms
    

  12. Running find query on mongos2 again went to shard1 but brought the document that was created in above step.

Further, I was able to fix this by clearing the metadata cache on mongos2. For this purpose, I used flushRouterConfig command:

db.adminCommand("flushRouterConfig")

Or, restarting mongos also fixes this issue.



 Comments   
Comment by Esha Maharishi (Inactive) [ 12/Nov/19 ]

Closing this as Gone Away since it was fixed by PM-1051.

Comment by Kelsey Schubert [ 26/Jan/16 ]

As Kaloian mentioned this behavior is expected and noted in our documentation.

I am repurposing this ticket as an improvement request to be considered as part of our planned improvements to tracking metadata in MongoDB 3.4.

Comment by Kaloian Manassiev [ 26/Jan/16 ]

Unfortunately, unsharded collections currently do not contain any version information so shard changes resulting from movePrimary cannot automatically be discovered by mongos instances other than the one on which movePrimary was run. Step 12 in your repro is documented here.

Generated at Thu Feb 08 04:00:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.