[SERVER-3754] Seeing lots of setShardVersion failed Created: 03/Sep/11  Updated: 29/Feb/12  Resolved: 22/Nov/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.8.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Theo Hultberg Assignee: Greg Studer
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-3755 mongos died unexpectedly Closed
Related
related to SERVER-3753 "db config reload failed" Closed
Operating System: ALL
Participants:

 Description   

We do client side partitioning by day, because we need to be able to remove lots of data quickly. With this scheme we run into all sort of bugs in mongos. This seems to be the latest one. The application gets "setShardVersion failed" when it attempts to write to a database that has been dropped.

The application is designed to not do this by periodically checking which databases exist and making sure to write only to those that exist. Still, and presumably because of SERVER-1726, it gets the names of databases that in fact do not exist, and subsequently when it tries to write to these it gets errors back.

This is from the mongos log:

Sat Sep  3 00:33:40 [conn9186] AssertionException in process: setShardVersion failed host[richcollshard1/richcolldb02,richcolldb01.byburt.com:27017] { oldVersion: Timestamp 0|0, assertion: "fragments_20110831.exposure_fragme
Sat Sep  3 00:33:40 [conn9186]      setShardVersion failed host[richcollshard3/richcolldb06,richcolldb05.byburt.com:27017] { oldVersion: Timestamp 0|0, assertion: "fragments_20110831.exposure_fragments dropped. Re-shard coll
Sat Sep  3 00:33:40 [conn9186] Assertion: 10429:setShardVersion failed host[richcollshard3/richcolldb06,richcolldb05.byburt.com:27017] { oldVersion: Timestamp 0|0, assertion: "fragments_20110831.exposure_fragments dropped. R
0x52019a 0x6a44bd 0x6a4022 
 /opt/mongodb-1.8.3/bin/mongos(_ZN5mongo11msgassertedEiPKc+0x12a) [0x52019a]
 /opt/mongodb-1.8.3/bin/mongos() [0x6a44bd]
 /opt/mongodb-1.8.3/bin/mongos() [0x6a4022]
Sat Sep  3 00:33:40 [conn9186] ~ScopedDBConnection: _conn != null
Sat Sep  3 00:33:40 [conn9186] AssertionException in process: setShardVersion failed host[richcollshard3/richcolldb06,richcolldb05.byburt.com:27017] { oldVersion: Timestamp 0|0, assertion: "fragments_20110831.exposure_fragme
Sat Sep  3 00:33:40 [conn9186] end connection 127.0.0.1:36975
Sat Sep  3 00:33:40 [mongosMain] connection accepted from 127.0.0.1:36978 #9190
Sat Sep  3 00:33:40 [conn9187] end connection 127.0.0.1:36978



 Comments   
Comment by Eliot Horowitz (Inactive) [ 03/Sep/11 ]

Can you try not dropping databases through mongos but hitting each shard and dropping the database that way?
Or try 2.0.0-rc1?

Comment by Theo Hultberg [ 03/Sep/11 ]

I looked through the logs one more time, and we also seeing this variation of the "setShardVersion failed" error:

Sat Sep  3 06:51:29 [conn2121]      setShardVersion failed host[richcollshard2/richcolldb03.byburt.com:27017,richcolldb04] { errmsg: "not master", ok: 0.0 }
Sat Sep  3 06:51:29 [conn2121] Assertion: 10429:setShardVersion failed host[richcollshard2/richcolldb03.byburt.com:27017,richcolldb04] { errmsg: "not master", ok: 0.0 }
0x52019a 0x6a44bd 0x6a4022 
 /opt/mongodb-1.8.3/bin/mongos(_ZN5mongo11msgassertedEiPKc+0x12a) [0x52019a]
 /opt/mongodb-1.8.3/bin/mongos() [0x6a44bd]
 /opt/mongodb-1.8.3/bin/mongos() [0x6a4022]
Sat Sep  3 06:51:29 [conn2121] ~ScopedDBConnection: _conn != null

from what I can tell by searching for that it should have been fixed in some 1.6.x, but this is 1.8.3.

Generated at Thu Feb 08 03:03:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.