[SERVER-15499] Confusion with error message "Assertion: 13110:HostAndPort: host is empty" when all data bearing nodes in shard are down Created: 02/Oct/14  Updated: 06/Dec/22  Resolved: 16/Nov/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.10
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Eoin Brazil Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OS: Linux ip 3.2.0-4-amd64 #1 SMP Debian 3.2.60-1+deb7u3 x86_64 GNU/Linux
MongoDB: 2.4.10
EC2 m3.medium for all MongoDB instances except arbiter services which ran on t1.micro


Issue Links:
Related
related to SERVER-4217 add log severity and component name t... Closed
is related to SERVER-13269 general log message improvements for 2.6 Closed
Assigned Teams:
Sharding
Operating System: Linux
Steps To Reproduce:

Setup 2 shards across EC2 N.Virginia and Ireland

EU-West with 1P 1A (replicaset default-s1) 1S (replicaset foo-s1)
EU-West also ran 1 config server and 1 mongos
US-East with 1P 1A (replicaset foo-s1) 1S (replicaset default-s1)
MongoDB: 2.4.10
OS: Linux ip 3.2.0-4-amd64 #1 SMP Debian 3.2.60-1+deb7u3 x86_64 GNU/Linux

In EU-West I killed using SIGTERM the following:
1P from replicaset default-s1
In US-East I killed using SIGTERM the following:
1S from replicaset default-s1

On the Mongos via the MongoShell then enter:

mongos> db.test.insert({"name":"frank"})
HostAndPort: host is empty
mongos> db.test.insert({"name":"frank"})
socket exception [CONNECT_ERROR] for default-s1/ec2-54-242-126-41.compute-1.amazonaws.com:27017,ec2-54-74-94-63.eu-west-1.compute.amazonaws.com:27017

Participants:

 Description   

Hi All,

There is some confusion with the error message:

] Assertion: 13110:HostAndPort: host is empty
0xa8d781 0xa5385b 0x69310f 0x6ad4b4 0x69b911 0x9f93c4 0x9f27b5 0x9f584a 0x9a743a 0x9c939f 0x9cbf19 0x99a9c1 0x666844 0xa79d4e 0x7fd53eac4b50 0x7fd53de67e6d
 mongos(_ZN5mongo15printStackTraceERSo+0x21) [0xa8d781]
 mongos(_ZN5mongo11msgassertedEiPKc+0x9b) [0xa5385b]
 mongos(_ZN5mongo16ConnectionString12_fillServersESs+0x3df) [0x69310f]
 mongos(_ZN5mongo16ConnectionStringC1ENS0_14ConnectionTypeERKSsS3_+0x84) [0x6ad4b4]
 mongos(_ZN5mongo16ConnectionString5parseERKSsRSs+0x2b1) [0x69b911]
 mongos(_ZN5mongo17WriteBackListener4initERNS_12DBClientBaseE+0x1e4) [0x9f93c4]
 mongos(_ZN5mongo17checkShardVersionEPNS_12DBClientBaseERKSsN5boost10shared_ptrIKNS_12ChunkManagerEEEbi+0x45) [0x9f27b5]
 mongos(_ZN5mongo14VersionManager19checkShardVersionCBEPNS_15ShardConnectionEbi+0x6a) [0x9f584a]
 mongos(_ZN5mongo15ShardConnection11_finishInitEv+0xfa) [0x9a743a]
 mongos(_ZN5mongo13ShardStrategy7_insertERKSsRNS_9DbMessageEiRNS_7RequestE+0x45f) [0x9c939f]
 mongos(_ZN5mongo13ShardStrategy7writeOpEiRNS_7RequestE+0x509) [0x9cbf19]
 mongos(_ZN5mongo7Request7processEi+0xd1) [0x99a9c1]
 mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x74) [0x666844]
 mongos(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xa79d4e]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7fd53eac4b50]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fd53de67e6d]

In the scenario where all of the data bearing nodes within a specific shard are down, this assertion message can cause confusion. Is it possible to refactor the error message from "HostAndPort: host is empty" for cases where all data bearing nodes in a replicaset are down/unavailable ?

Thanks!
Eoin



 Comments   
Comment by Ramon Fernandez Marina [ 02/Oct/14 ]

I'm not able to trigger that error. I created a setup like the one described above, and when if I try to insert a document into an empty database I get:

mongos> db.foo.insert({x:1})
Thu Oct  2 11:47:14.999 error: {
	"$err" : "error creating initial database config information :: caused by :: ReplicaSetMonitor no master found for set: shard03",
	"code" : 10009
} at src/mongo/shell/query.js:128

If the database already exists, I get:

mongos> db.foo.insert({x:1})
ReplicaSetMonitor no master found for set: shard01

I get the errors above when terminating data bearing nodes one at a time with a pause in between. If I terminate them at the same time then the error message is different:

mongos> db.foo.insert({x:1})
socket exception [CONNECT_ERROR] for shard01/skye.local:27018,skye.local:27019

So I wonder if something else is going on that's triggering the assertion you're seeing.

Note also that the "HostPort" assertion may be triggered by other situations, so I don't think changing it to match this specific scenario will help as it will make other scenarios confusing.

Generated at Thu Feb 08 03:38:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.