[SERVER-35658] session migration is too sensitive to replica set primary elections Created: 18/Jun/18  Updated: 29/Oct/23  Resolved: 24/Jul/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.5
Fix Version/s: 3.6.7, 4.0.2, 4.1.2

Type: Bug Priority: Major - P3
Reporter: Elan Kugelmass Assignee: Matthew Saltz (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MongoD/MongoS 3.6.5
Ubuntu 14.04/16.04


Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Sharding 2018-07-16, Sharding 2018-07-30
Participants:

 Description   

Retryable writes was added in v3.6, and the way the server remembers a write is written is by tagging the oplog entry for the write. During migration, the server will also need to transfer the oplog entries related to retryable write to the destination shard. It also has to make sure that the oplog entries it sends doesn't get rolled back. The current code is extra paranoid and can trigger the assert even though the oplog entries are majority committed.

 

Original Description: One collection will not migrate chunks due to change of term, but reports stale term

I have a 3.6.5 cluster with four (three-member) shards, three config servers, and 13 routers.

Aside from the config database, my cluster contains two other databases. Each of those databases has identical collection counts and names. In each database, we've sharded the same two collections in the same way across databases. In one database, the balancer has successfully balanced both collections. In the other database, the balancer has only balanced one collection.

The other collection fails to balance with the error message "commit clone failed :: caused by :: Location40650: detected change of term from 228 to 229". Oddly enough, the shard from which the chunk is being moved is on election term 231. This error message is new to Mongo 3.6.x: 

https://github.com/mongodb/mongo/blob/570c2570a728a646a953cee6abedeaa3517f8215/src/mongo/db/s/migration_chunk_cloner_source_legacy.cpp#L772

I stopped every single component of the cluster (mongos/shardserver/configserver) and then started them back up hoping that the problem would resolve itself. I didn't see any improvement. I also checked for replica set lag and didn't see any.

Please let me know what information I can provide to make diagnosis easier.

 

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
  	"_id" : 1,
  	"minCompatibleVersion" : 5,
  	"currentVersion" : 6,
  	"clusterId" : ObjectId("5b11d3f969791a2b6a246ef9")
  }
  shards:
        {  "_id" : "rs0",  "host" : "rs0/mongos0r1:27017,mongos0r2:27017,mongos0r3:27017",  "state" : 1 }
        {  "_id" : "rs1",  "host" : "rs1/mongos1r1:27018,mongos1r2:27018,mongos1r3:27018",  "state" : 1 }
        {  "_id" : "rs2",  "host" : "rs2/mongos2r1:27018,mongos2r2:27018,mongos2r3:27018",  "state" : 1 }
        {  "_id" : "rs3",  "host" : "rs3/mongos3r1:27018,mongos3r2:27018,mongos3r3:27018",  "state" : 1 }
  active mongoses:
        "3.6.5" : 13
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
                Balancer active window is set between 03:30 and 01:00 server local time
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                2747 : Success
                926 : Failed with error 'aborted', from rs0 to rs1
                932 : Failed with error 'aborted', from rs0 to rs3
                917 : Failed with error 'aborted', from rs0 to rs2
  databases:
        {  "_id" : "mydb",  "primary" : "rs0",  "partitioned" : true }
                mydb.mycoll
                        shard key: {
                        	"field1" : 1,
                        	"field2" : 1
                        }
                        unique: true
                        balancing: true
                        chunks:
                                rs0	96708
                        too many chunks to print, use verbose if you want to force print
                mydb.fs.chunks
                        shard key: { "files_id" : 1, "n" : 1 }
                        unique: true
                        balancing: true
                        chunks:
                                rs0	21116
                                rs1	1711
                                rs2	1711
                                rs3	1712
                        too many chunks to print, use verbose if you want to force print
        {  "_id" : "otherdb",  "primary" : "rs0",  "partitioned" : true }
                otherdb.mycoll
                        shard key: {
                            "field1" : 1,
                            "field2" : 1
                        }
                        unique: true
                        balancing: true
                        chunks:
                                rs0	8697
                                rs1	8697
                                rs2	8697
                                rs3	8697
                        too many chunks to print, use verbose if you want to force print
                otherdb.fs.chunks
                        shard key: { "files_id" : 1, "n" : 1 }
                        unique: true
                        balancing: true
                        chunks:
                                rs0	2376
                                rs1	2375
                                rs2	2376
                                rs3	2376
                        too many chunks to print, use verbose if you want to force print
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                config.system.sessions
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                rs0	1
                        { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : rs0 Timestamp(1, 0) 
        {  "_id" : "nagios",  "primary" : "rs0",  "partitioned" : false }
        {  "_id" : "test",  "primary" : "rs1",  "partitioned" : false }

mongos>sh.moveChunk("mydb.mycoll", {"field1":1, "field2":1}, "rs1")
{
    "ok" : 0,
    "errmsg" : "commit clone failed :: caused by :: Location40650: detected change of term from 228 to 229",
    "code" : 125,
    "codeName" : "CommandFailed",
    "$clusterTime" : {
        "clusterTime" : Timestamp(1529346339, 7898),
        "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
        }
    },
    "operationTime" : Timestamp(1529346339, 7898)
}
mongos>

rs0:PRIMARY> rs.status()
{
    "set" : "rs0",
    "date" : ISODate("2018-06-18T18:29:55.149Z"),
    "myState" : 1,
    "term" : NumberLong(231),
    "heartbeatIntervalMillis" : NumberLong(2000),
    "optimes" : {
        "lastCommittedOpTime" : {
            "ts" : Timestamp(1529346594, 10),
            "t" : NumberLong(231)
        },
        "readConcernMajorityOpTime" : {
            "ts" : Timestamp(1529346594, 10),
            "t" : NumberLong(231)
        },
        "appliedOpTime" : {
            "ts" : Timestamp(1529346594, 10),
            "t" : NumberLong(231)
        },
        "durableOpTime" : {
            "ts" : Timestamp(1529346594, 10),
            "t" : NumberLong(231)
        }
    },
    "members" : [
        {
            "_id" : 3,
            "name" : "mongos0r1:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 1999,
            "optime" : {
                "ts" : Timestamp(1529346594, 10),
                "t" : NumberLong(231)
            },
            "optimeDate" : ISODate("2018-06-18T18:29:54Z"),
            "electionTime" : Timestamp(1529346325, 2),
            "electionDate" : ISODate("2018-06-18T18:25:25Z"),
            "configVersion" : 205694,
            "self" : true
        },
        {
            "_id" : 4,
            "name" : "mongos0r2:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 1964,
            "optime" : {
                "ts" : Timestamp(1529346593, 5),
                "t" : NumberLong(231)
            },
            "optimeDurable" : {
                "ts" : Timestamp(1529346593, 5),
                "t" : NumberLong(231)
            },
            "optimeDate" : ISODate("2018-06-18T18:29:53Z"),
            "optimeDurableDate" : ISODate("2018-06-18T18:29:53Z"),
            "lastHeartbeat" : ISODate("2018-06-18T18:29:53.503Z"),
            "lastHeartbeatRecv" : ISODate("2018-06-18T18:29:54.441Z"),
            "pingMs" : NumberLong(0),
            "syncingTo" : "mongos0r3:27017",
            "configVersion" : 205694
        },
        {
            "_id" : 5,
            "name" : "mongos0r3:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 1980,
            "optime" : {
                "ts" : Timestamp(1529346593, 5),
                "t" : NumberLong(231)
            },
            "optimeDurable" : {
                "ts" : Timestamp(1529346593, 5),
                "t" : NumberLong(231)
            },
            "optimeDate" : ISODate("2018-06-18T18:29:53Z"),
            "optimeDurableDate" : ISODate("2018-06-18T18:29:53Z"),
            "lastHeartbeat" : ISODate("2018-06-18T18:29:53.508Z"),
            "lastHeartbeatRecv" : ISODate("2018-06-18T18:29:53.971Z"),
            "pingMs" : NumberLong(0),
            "syncingTo" : "mongos0r1:27017",
            "configVersion" : 205694
        }
    ],
    "ok" : 1,
    "operationTime" : Timestamp(1529346594, 10),
    "$gleStats" : {
        "lastOpTime" : Timestamp(0, 0),
        "electionId" : ObjectId("7fffffff00000000000000e7")
    },
    "$clusterTime" : {
        "clusterTime" : Timestamp(1529346594, 19),
        "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
        }
    },
    "$configServerState" : {
        "opTime" : {
            "ts" : Timestamp(1529346594, 16),
            "t" : NumberLong(5)
        }
    }
}
rs0:PRIMARY>



 Comments   
Comment by Ramon Fernandez Marina [ 27/Aug/18 ]

Thanks for the update epkugelmass, and glad to hear you were able to balance your cluster with 3.6.7.

Regards,
Ramón.

Comment by Elan Kugelmass [ 26/Aug/18 ]

Confirmed we're able to balance our cluster once we upgrade to 3.6.7. Thanks for the fix!

Comment by Githook User [ 06/Aug/18 ]

Author:

{'name': 'Matthew Saltz', 'email': 'matthew.saltz@mongodb.com', 'username': 'saltzm'}

Message: SERVER-35658 Check for rollback at end of each session migration batch

(cherry picked from commit 9175c4deba82dc35606d14428d1bf0d8b43d7a6c)
Branch: v4.0
https://github.com/mongodb/mongo/commit/9d2c97bf741d44783ebd56de60c629bb3c5e8585

Comment by Githook User [ 24/Jul/18 ]

Author:

{'name': 'Matthew Saltz', 'email': 'matthew.saltz@mongodb.com'}

Message: SERVER-35658 Check for rollback at end of each session migration batch

(cherry picked from commit 9175c4deba82dc35606d14428d1bf0d8b43d7a6c)
Branch: v3.6
https://github.com/mongodb/mongo/commit/a69e99284207a1f356e62696500523b1cbd8192d

Comment by Githook User [ 24/Jul/18 ]

Author:

{'name': 'Matthew Saltz', 'email': 'matthew.saltz@mongodb.com'}

Message: SERVER-35658 Check for rollback at end of each session migration batch
Branch: master
https://github.com/mongodb/mongo/commit/9175c4deba82dc35606d14428d1bf0d8b43d7a6c

Comment by Randolph Tan [ 09/Jul/18 ]

epkugelmass yes, we're aiming to backport it to v4.0 and possibly to v3.6. Currently, the safest way to get around this is to either start a new operation that will target the shard having the issue for every session in config.transactions or kill all sessions in the config.transactions of the shard and wait for the documents to go away (remove is done asynchronously).

Comment by Elan Kugelmass [ 05/Jul/18 ]

renctan do you anticipate introducing a patch to make the behavior less paranoid? In the interim, is there anything I can do to get my cluster balancing again?

Comment by Randolph Tan [ 05/Jul/18 ]

Retryable writes was added in v3.6, and the way the server remembers a write is written is by tagging the oplog entry for the write. During migration, the server will also need to transfer the oplog entries related to retryable write to the destination shard. It also has to make sure that the oplog entries it sends doesn't get rolled back. The current change is extra paranoid and can trigger the assert even though the oplog entries are majority committed.

Comment by Esha Maharishi (Inactive) [ 05/Jul/18 ]

epkugelmass, I spoke to renctan and he will take it over as he is familiar with this code.

Comment by Elan Kugelmass [ 03/Jul/18 ]

esha.maharishi renctan Can I propose raising the priority on this ticket to critical? The bug appears to be recurring, has no known workaround, and (due to the fact that the cluster can't balance) severely impacts the usability and value of a sharded cluster.

Comment by Elan Kugelmass [ 28/Jun/18 ]

The problem has reappeared.

SHARDING [conn106] Chunk move failed :: caused by :: CommandFailed: commit clone failed :: caused by :: Location40650: detected change of term from 231 to 232
... but the election term is actually 233

Comment by Elan Kugelmass [ 20/Jun/18 ]

Scrolling through the mongod log, we also see "inverted" term error messages like exception: commit clone failed :: caused by :: Location40650: detected change of term from 231 to 223.

After several days of the failures described here and in the initial description, the collection began successfully balancing and we haven't seen any problems since.

Comment by Esha Maharishi (Inactive) [ 18/Jun/18 ]

It seems like that error message came from new code added for retryable writes... CC renctan

Generated at Thu Feb 08 04:40:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.