[SERVER-31627] ShardingTest.checkUUIDsConsistentAcrossCluster can fail to see collection in config.cache.collections Created: 18/Oct/17  Updated: 30/Oct/23  Resolved: 05/Dec/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.5.13
Fix Version/s: 3.6.1, 3.7.1

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Esha Maharishi (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Sprint: Sharding 2017-11-13, Sharding 2017-12-04, Sharding 2017-12-18
Participants:
Linked BF Score: 0

 Description   

of the local shard because the entry is created asynchronously.



 Comments   
Comment by Githook User [ 05/Dec/17 ]

Author:

{'username': 'EshaMaharishi', 'email': 'esha.maharishi@mongodb.com', 'name': 'Esha Maharishi'}

Message: SERVER-31627 blacklist dump_coll_metadata.js from sharding_last_stable_mongos_and_mixed_shards
Branch: master
https://github.com/mongodb/mongo/commit/3d7be48d3b09db2c7ac723043b10c014430e85ed

Comment by Githook User [ 05/Dec/17 ]

Author:

{'username': 'EshaMaharishi', 'email': 'esha.maharishi@mongodb.com', 'name': 'Esha Maharishi'}

Message: SERVER-31627 ShardingTest.checkUUIDsConsistentAcrossCluster can fail to see collection in config.cache.collections

(cherry picked from commit 84b68e8459df1b795fa25eeaee05b76967eb9406)
Branch: v3.6
https://github.com/mongodb/mongo/commit/48c2b8670fc2dca3fe77f98f49741e74f4f21dca

Comment by Githook User [ 05/Dec/17 ]

Author:

{'username': 'EshaMaharishi', 'email': 'esha.maharishi@mongodb.com', 'name': 'Esha Maharishi'}

Message: SERVER-31627 ShardingTest.checkUUIDsConsistentAcrossCluster can fail to see collection in config.cache.collections
Branch: master
https://github.com/mongodb/mongo/commit/84b68e8459df1b795fa25eeaee05b76967eb9406

Comment by Esha Maharishi (Inactive) [ 05/Dec/17 ]

Note: this symptom (the shard not having an entry in its persisted cache for some sharded collection) may still appear in the config stepdown suite after this fix if the following occurs:

1) shardCollection runs on config primary, writes entry to config.collections, and entry propagates to another node
2) config primary steps down before sending setShardVersion to primary shard
3) new primary has the config.collections entry

If the test does not do anything else that would make the primary shard refresh, the primary shard will not have an entry for the collection in its config.cache.collections.

This happened in my patch build: https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_sharding_csrs_continuous_config_stepdown_WT_patch_671dd2c46fb49aabd969781b3c8c90cb109a1032_5a25d327e3c33129d00121ff_17_12_04_22_58_54

You can see that setShardVersion was not sent for 'test.compound' in the logs:

$ egrep "cmd.*setShardVersion|about to log.*shardCollection.start|Stepping down" logs
[js_test:basic_split] 2017-12-04T23:37:03.390+0000 c23762| 2017-12-04T23:37:02.136+0000 I SHARDING [conn1] about to log metadata event into changelog: { _id: "ip-10-69-213-225-2017-12-04T23:37:02.136+0000-5a25dc1ef68df4aee57fd6ce", server: "ip-10-69-213-225", clientAddr: "127.0.0.1:36939", time: new Date(1512430622136), what: "shardCollection.start", ns: "config.system.sessions", details: { shardKey: { _id: 1 }, collection: "config.system.sessions", uuid: UUID("d8873cee-9397-41b6-b717-550f7b293b46"), primary: "shard0000:ip-10-69-213-225:23760", numChunks: 1 } }
[js_test:basic_split] 2017-12-04T23:37:03.795+0000 c23762| 2017-12-04T23:37:02.804+0000 D ASIO     [conn1] startCommand: RemoteCommand 49 -- target:ip-10-69-213-225:23760 db:admin cmd:{ setShardVersion: "config.system.sessions", init: false, authoritative: true, configdb: "basic_split-configRS/ip-10-69-213-225:23762,ip-10-69-213-225:23763,ip-10-69-213-225:23764", shard: "shard0000", shardHost: "ip-10-69-213-225:23760", version: Timestamp(1, 0), versionEpoch: ObjectId('5a25dc1ef68df4aee57fd6db'), noConnectionVersioning: true }
[js_test:basic_split] 2017-12-04T23:37:04.124+0000 *** Stepping down connection to ip-10-69-213-225:23762
[js_test:basic_split] 2017-12-04T23:37:13.037+0000 c23763| 2017-12-04T23:37:12.276+0000 I SHARDING [conn19] about to log metadata event into changelog: { _id: "ip-10-69-213-225-2017-12-04T23:37:12.276+0000-5a25dc28ae5a58d69f4904d6", server: "ip-10-69-213-225", clientAddr: "10.69.213.225:44867", time: new Date(1512430632276), what: "shardCollection.start", ns: "test.user", details: { shardKey: { _id: 1.0 }, collection: "test.user", uuid: UUID("19d1ef07-d70d-4184-a99f-58dc80cd73c4"), primary: "shard0000:ip-10-69-213-225:23760", numChunks: 1 } }
[js_test:basic_split] 2017-12-04T23:37:13.224+0000 c23763| 2017-12-04T23:37:12.545+0000 D ASIO     [conn19] startCommand: RemoteCommand 453 -- target:ip-10-69-213-225:23760 db:admin cmd:{ setShardVersion: "test.user", init: false, authoritative: true, configdb: "basic_split-configRS/ip-10-69-213-225:23762,ip-10-69-213-225:23763,ip-10-69-213-225:23764", shard: "shard0000", shardHost: "ip-10-69-213-225:23760", version: Timestamp(1, 0), versionEpoch: ObjectId('5a25dc28ae5a58d69f4904e3'), noConnectionVersioning: true }
[js_test:basic_split] 2017-12-04T23:37:19.091+0000 *** Stepping down connection to ip-10-69-213-225:23763
[js_test:basic_split] 2017-12-04T23:37:19.460+0000 c23763| 2017-12-04T23:37:18.534+0000 I SHARDING [conn19] about to log metadata event into changelog: { _id: "ip-10-69-213-225-2017-12-04T23:37:18.534+0000-5a25dc2eae5a58d69f490721", server: "ip-10-69-213-225", clientAddr: "10.69.213.225:44867", time: new Date(1512430638534), what: "shardCollection.start", ns: "test.compound", details: { shardKey: { x: 1.0, y: 1.0 }, collection: "test.compound", uuid: UUID("344af56e-5d47-4f65-9127-cde9ad517942"), primary: "shard0000:ip-10-69-213-225:23760", numChunks: 1 } }

Generated at Thu Feb 08 04:27:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.