[SERVER-4020] bad serverID set in setShardVersion Created: 05/Oct/11  Updated: 06/Apr/23  Resolved: 05/Oct/11

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.0
Fix Version/s: 2.0.1, 2.0.2, 2.1.0

Type: Bug Priority: Major - P3
Reporter: Greg Studer Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongo_config_log_error     File only_insert.js    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-4255 write with bad shard config and no se... Closed
Related
Operating System: ALL
Participants:

 Description   

It's possible in certain cases for a invalid server ID to be sent to a mongod, resulting in stale shard version information unable to be re-checked.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 13/Jan/12 ]

@pawel - can you open a new ticket with all the of the relevant logs

Comment by Pawel Piotrowski [ 12/Jan/12 ]

We've made upgrade of mongod and mongos from 2.0.1 to 2.0.2. In our setup we've got: 4 aplication servers (nginx+php fastcgi) and mongos running on that nodes (php is connecting thrgough unix socket without persistent connections and with autoreconnect) and replica set of 7 nodes (1 master and 6 slaves).

Those warning does not occour on every new connection, but very often (we are running very busy website)

Comment by Greg Studer [ 12/Jan/12 ]

Do you have any active mongoses on 2.0.1? - you'll have to upgrade all of those to eliminate these warnings.

Also, do the warnings occur once-per-new-connection?

Comment by Pawel Piotrowski [ 12/Jan/12 ]

Few hours ago we upgraded from 2.0.1 to 2.0.2 and in logs we still see this error

Thu Jan 12 13:30:45 [conn29910] warning: bad serverID set in setShardVersion and none in info: EOO

Comment by Michael Tewner [ 22/Dec/11 ]

This seems to be solved by our servers too in 2.0.2. Thanks Guys!

On Thu, Dec 22, 2011 at 8:06 AM, Vladimir Muzhilov (JIRA)

Comment by Vladimir Muzhilov [ 22/Dec/11 ]

Bingo!!!

I confirm that in the release 2.0,2 bug is not observed
(works fine on 2.0.2-rc2-pre- self compile and on 2.0.2 release from repository 10gen)

Comment by beier cai [ 22/Dec/11 ]

Anyone can confirm this is indeed fixed on 2.0.2? I'm running 2.0.2 for all mongod and mongos, so far haven't seen this error

Comment by Itai Shemesh [ 11/Dec/11 ]

I was going crazy for the last few days , after upgrading to the "So-called" Stable 2.0.1 release:

  • Got crazy load of connections that were never got closed.
  • TONS of errors/exceptions in mongod log files:
    "bad serverID set in setShardVersion and none in info: EOO"
  • Many ::insert() actions that were dropped because of "timeout()" / "socket error"

I think I found the solution for this !
All my shards (MongoD) are 2.0.1 , and I have downgraded all my mongos instances to 1.8.4 .
Works like a charm now !

Comment by Itai Shemesh [ 11/Dec/11 ]

I was going crazy for the last few days , after upgrading to the "So-called" Stable 2.0.1 release:

  • Got crazy load of connections that were never got closed.
  • TONS of errors/exceptions in mongod log files:
    "bad serverID set in setShardVersion and none in info: EOO"
  • Many ::insert() actions that were dropped because of "timeout()" / "socket error"

I think I found the solution for this !
All my shards (MongoD) are 2.0.1 , and I have downgraded all my mongos instances to 1.8.4 .
Works like a charm now !

Comment by Zac Witte [ 07/Dec/11 ]

@Eliot, I was pretty sure they were, but I just shut everything down and restarted again and now I can't reproduce the warning. Could be that one of the mongos's was still running. I'll post again if I see it again.

Comment by Eliot Horowitz (Inactive) [ 07/Dec/11 ]

@zac - are mongos all upgraded also?

Comment by Zac Witte [ 07/Dec/11 ]

Still seeing this warning in 2.0.2-rc1 on the config server. See attached log file (with -vv)

Comment by Greg Studer [ 17/Nov/11 ]

test case reproduces in 2.0.1, 2.0.2 correctly routes new inserts.

Comment by Greg Studer [ 17/Nov/11 ]

@Michael - can you upgrade to rc0 when released to see if the write with bad shard config issue is triggered? Also, do you have the logs for mongos and mongod around the same time as the exception?

Comment by Greg Studer [ 17/Nov/11 ]

As a note, while the message shows up in the mongod logs, the actual fix requires 2.0.2 mongos.

Comment by Jon Hoffman [ 17/Nov/11 ]

@eliot: it's this commit: https://github.com/mongodb/mongo/commit/a695bea89c338f6f458abc258eee16f415edeae1

Comment by Eliot Horowitz (Inactive) [ 17/Nov/11 ]

@leo - what git version?

Comment by Leo Kim [ 16/Nov/11 ]

Confirmed that this is still happening on 2.0.2 (built off of master branch)

Comment by Michael Tewner [ 16/Nov/11 ]

I can confirm that I'm getting this on a fresh install of 2.0.1 with replicas and sharding:

Traceback (most recent call last):
File "./mongoselectupdate.py", line 43, in <module>
db_rw.posts.update(

{ "_id" : result["_id"] }

, { "$set" :

{ "msg" : someNumber , "timestamp" : myTime }

}, safe=True )
File "/usr/lib/python2.6/site-packages/pymongo-2.0.1-py2.6-linux-x86_64.egg/pymongo/collection.py", line 378, in update
spec, document, safe, kwargs), safe)
File "/usr/lib/python2.6/site-packages/pymongo-2.0.1-py2.6-linux-x86_64.egg/pymongo/connection.py", line 749, in _send_message
return self.__check_response_to_last_error(response)
File "/usr/lib/python2.6/site-packages/pymongo-2.0.1-py2.6-linux-x86_64.egg/pymongo/connection.py", line 701, in __check_response_to_last_error
raise OperationFailure(error["err"])
pymongo.errors.OperationFailure: write with bad shard config and no server id!

Comment by auto [ 01/Nov/11 ]

Author:

{u'login': u'gregstuder', u'name': u'gregs', u'email': u'greg@10gen.com'}

Message: initialize shard connection with serverId, only setShardV on single/replsets SERVER-4020
Branch: v2.0
https://github.com/mongodb/mongo/commit/d11ede1282801d3b7acca1d8de21627b14a44762

Comment by Christian Tonhäuser [ 31/Oct/11 ]

Eliot: Yes, sort of... We didn't change log files, so it's currently all in single, huge files.

However, let me explain in a bit more detail what happened:

We got the following stacktrace in our application log:

Caused by: com.mongodb.MongoException: write with bad shard config and no server id!
at com.mongodb.CommandResult.getException(CommandResult.java:82)
at com.mongodb.CommandResult.throwOnError(CommandResult.java:116)
at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:126)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:148)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:132)
at com.mongodb.DBApiLayer$MyCollection.update(DBApiLayer.java:343)
at com.mongodb.DBCollection.save(DBCollection.java:641)
at ... (application-specific stack trace stuff here)

However, in the corresponding mongos.log I cannot find anything suspicious during the time.
All that is in there is some "connection accepted" messages and log output from the Balancer, but no error messages, nothing.

After a quick check of the mongod logs, I find a lot of things like this:

Wed Oct 26 09:19:59 [initandlisten] connection accepted from 10.35.9.102:48758 #13050
Wed Oct 26 09:19:59 [conn13050] warning: bad serverID set in setShardVersion and none in info: EOO
Wed Oct 26 09:20:00 [initandlisten] connection accepted from 10.35.9.104:56062 #13051
Wed Oct 26 09:20:00 [conn13051] warning: bad serverID set in setShardVersion and none in info: EOO
Wed Oct 26 09:20:00 [initandlisten] connection accepted from 10.35.9.101:35900 #13052
Wed Oct 26 09:20:00 [conn13052] warning: bad serverID set in setShardVersion and none in info: EOO
Wed Oct 26 09:20:06 [clientcursormon] mem (MB) res:27168 virt:335760 mapped:166880
Wed Oct 26 09:20:16 [initandlisten] connection accepted from 10.35.9.103:46500 #13053
Wed Oct 26 09:20:16 [conn13053] warning: bad serverID set in setShardVersion and none in info: EOO
Wed Oct 26 09:20:26 [initandlisten] connection accepted from 10.35.9.104:56328 #13054
Wed Oct 26 09:20:26 [conn13054] warning: bad serverID set in setShardVersion and none in info: EOO
Wed Oct 26 09:20:32 [initandlisten] connection accepted from 10.35.9.101:47774 #13055
Wed Oct 26 09:20:32 [conn13055] warning: bad serverID set in setShardVersion and none in info: EOO
Wed Oct 26 09:20:48 [initandlisten] connection accepted from 10.35.9.101:47788 #13056
Wed Oct 26 09:20:48 [conn13056] warning: bad serverID set in setShardVersion and none in info: EOO
Wed Oct 26 09:20:48 [initandlisten] connection accepted from 10.35.9.101:47792 #13057
Wed Oct 26 09:20:48 [conn13057] warning: bad serverID set in setShardVersion and none in info: EOO
(The snippet was taken from the timeframe when we had on of the exceptions on one of our appservers...)

However, our mongod processes are not running with -vvvvv, only the mongos.

From the mailing list I gather that these log messages can be ignored, but what has caused the exception, then?
(ref: http://groups.google.com/group/mongodb-user/browse_thread/thread/2aa469d30a33bd2a/ee0a30c69c3003f5?lnk=raot)

BR,

Christian

Comment by Eliot Horowitz (Inactive) [ 31/Oct/11 ]

@Christian - do you have the logs without -vvvv?

Comment by Christian Tonhäuser [ 31/Oct/11 ]

We are still experiencing this issue after upgrading to 2.0.1 final.

Scott asked us to set logging to -vvvvv on the mongos processes.
However, after we did this, we are unable to reproduce the bug.
The only thing we changed was the log level on the mongos, nothing else was altered on the system.

We'll try to set up a testing environment in order to analyze this further.

Comment by auto [ 24/Oct/11 ]

Author:

{u'login': u'gregstuder', u'name': u'gregs', u'email': u'greg@10gen.com'}

Message: initialize shard connection with serverId, only setShardV on single/replsets SERVER-4020
Branch: master
https://github.com/mongodb/mongo/commit/3e9dfebcf135501213edfae66499bfbffb84a666

Comment by Eliot Horowitz (Inactive) [ 05/Oct/11 ]

https://github.com/mongodb/mongo/commit/bd31bb40a66b33ca785d8aeaf79d46b936772295

Generated at Thu Feb 08 03:04:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.