[SERVER-37475] mongod backtrace if shard tag ranges are added before collection is sharded Created: 04/Oct/18  Updated: 04/Nov/18  Resolved: 12/Oct/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Timothy Olsen (Inactive) Assignee: Janna Golden
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File config-dump.tar.gz     Text File csrs0.log     Text File csrs1.log     Text File csrs2.log     Text File mongos.log     Text File shard0-0.log     Text File shard0-1.log     Text File shard0-2.log     Text File shard1-0.log     Text File shard1-1.log     Text File shard1-2.log    
Issue Links:
Depends
depends on SERVER-37578 Assert that a zone is associated with... Closed
Operating System: ALL
Steps To Reproduce:

1. Deploy a 2 shard sharded cluster. Each shard is a PSA replica set.
2. Add shard tag ranges by upserting directly to config.tags
3. Shard the collection referred to by the shard tag range

Participants:

 Description   

On a 2-shard, PSA sharded cluster, if shard tag ranges are added before a collection is sharded, some shard members will backtrace.

This appears to be a regression in 4.0.3-rc0. We have not seen this happen in 4.0.2.

Logs attached.



 Comments   
Comment by Kaloian Manassiev [ 12/Oct/18 ]

janna.golden, yes we should definitely have this check instead of crashing

There must be one already for the old initial split logic which executes on the config server, right?

Comment by Janna Golden [ 11/Oct/18 ]

I'm going to close this ticket and am marking it as related to SERVER-37578.

Comment by Janna Golden [ 11/Oct/18 ]

It looks like the tag "tag1" is not associated with a shard in your cluster as is. When I restored the cluster and initially ran shardCollection, it also seg faulted. I then ran

db.shards.find({tags:"tag1"})

, and saw that there were no shards associated with this zone. After adding the zone to shard01, we can successfully shard the collection.

mongos> db.shards.find({tags:"tag1"})
mongos> sh.addShardTag("shard01", "tag1")
{
	"ok" : 1,
	"operationTime" : Timestamp(1539281520, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1539281520, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
mongos> db.shards.find({tags:"tag1"})
{ "_id" : "shard01", "host" : "shard01/localhost:20001", "state" : 1, "tags" : [ "tag1" ] }
mongos> db.runCommand({enableSharding: "a"})
{
	"ok" : 1,
	"operationTime" : Timestamp(1539281555, 3),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1539281555, 3),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
mongos> db.runCommand({shardCollection: "a.b", key: {"key1": 1}})
{
	"collectionsharded" : "a.b",
	"collectionUUID" : UUID("a4f55b89-807e-4374-82d9-577c6ad3a4e8"),
	"ok" : 1,
	"operationTime" : Timestamp(1539281562, 14),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1539281562, 14),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

kaloian.manassiev, we should probably add a check that the zone we are attempting to shard is associated with a particular shard and return an error to the user rather than letting this seg fault.

cc tim.olsen

Comment by Timothy Olsen (Inactive) [ 10/Oct/18 ]

config-dump.tar.gz attached

Comment by Timothy Olsen (Inactive) [ 10/Oct/18 ]

It is reproducible. I am attaching a dump of the config database after we add the tags but before we shard the collection.

As far as generating the tag document, we upsert a document with the keys _id, ns, min, max, and tag into the config.tags collection.

Comment by Kaloian Manassiev [ 10/Oct/18 ]

The stack traces are in shard0-0.log and shard0-2.log (tim.olsen, for future filing of tickets, please paste the stack trace in the description or at least mention which files had it so we don't have to search through 10 attachments).

The crash occurred because of nullptr access with this call stack which is this line, but I can't see how chunks can be nullptr:

 mongod(void std::__1::vector<mongo::ChunkType, std::__1::allocator<mongo::ChunkType> >::__emplace_back_slow_path<mongo::NamespaceString const&, mongo::ChunkRange, mongo::ChunkVersion&, mongo::ShardId const&>(mongo::NamespaceString const&, mongo::ChunkRange&&, mongo::ChunkVersion&, mongo::ShardId const&) 0x1A6) [0x104eb2336]
 mongod(mongo::appendChunk(mongo::NamespaceString const&, mongo::BSONObj const&, mongo::BSONObj const&, mongo::ChunkVersion*, mongo::Timestamp const&, mongo::ShardId const&, std::__1::vector<mongo::ChunkType, std::__1::allocator<mongo::ChunkType> >*) 0x1CA) [0x104eaddfa]
 mongod(mongo::InitialSplitPolicy::generateShardCollectionInitialZonedChunks(mongo::NamespaceString const&, mongo::ShardKeyPattern const&, mongo::Timestamp const&, std::__1::vector<mongo::TagsType, std::__1::allocator<mongo::TagsType> > const&, mongo::UnorderedFastKeyTable<mongo::StringData, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::vector<mongo::ShardId, std::__1::allocator<mongo::ShardId> >, mongo::StringMapTraits> const&, std::__1::vector<mongo::ShardId, std::__1::allocator<mongo::ShardId> > const&) 0x288) [0x104eb0448]

The only unusual thing I am seeing is this line of code which assigns const reference to a stack variable, but I think this is valid C++.

tim.olsen - how do you generate the tags documents? I have slight suspicion that they might be missing some value which we made required. Is this reproducible and would it be possible to dump the config databases after you write the tags, but before running shardCollection so we can restore it and give it a try?

Assigning to janna.golden to have a look.

Generated at Thu Feb 08 04:46:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.