[SERVER-3753] "db config reload failed" Created: 03/Sep/11  Updated: 11/Jul/16  Resolved: 22/Nov/11

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 1.8.3
Fix Version/s: 2.1.0

Type: Bug Priority: Major - P3
Reporter: Theo Hultberg Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Three shards running 1.8.3 on Ubuntu in EC2


Issue Links:
Related
is related to SERVER-3754 Seeing lots of setShardVersion failed Closed
is related to SERVER-3755 mongos died unexpectedly Closed
Operating System: ALL
Participants:

 Description   

We got a couple of hundred of these errors today before our application closed the connection to mongos and reconnected:

Mongo::OperationFailure: db config reload failed!

the only thing I can see in the mongos logs at the same time is this:

Sat Sep  3 06:12:57 [conn1630] ns: fragments_20110831.exposure_fragments ClusteredCursor::query attempt: 0
Sat Sep  3 06:13:01 [conn1630] ns: fragments_20110831.exposure_fragments ClusteredCursor::query attempt: 0
Sat Sep  3 06:13:12 [conn1630] ns: fragments_20110831.exposure_fragments ClusteredCursor::query attempt: 0

which doesn't look related, but we have approximately 19000 lines of that message in the mongos log, just in the first six hours of today.



 Comments   
Comment by David Tollmyr [ 08/Sep/11 ]

We changed our purging implementation to drop databases on the individual shards, and it seems to work better. We kept getting errors about previously dropped databases through mongos. So i ended up cleaning all invalid entries from the config db manually and then dropping those entries on the shards if they were present. That seems to have removed the bugged entries completely, and now databases are reported correctly in mongos. I find it interesting though that when dropping databases on the shard, that information seems to propagate to mongos rather quickly and removing that db from the list in mongos as well.

Comment by Eliot Horowitz (Inactive) [ 08/Sep/11 ]

Hey - just checking in - any progress?

Comment by Eliot Horowitz (Inactive) [ 03/Sep/11 ]

No - absolutely not.
The idea is that mongos would still think it was sharded, just that there wouldn't be any data in it, which is fine.
So I wouldn't update the config servers at all.

Comment by David Tollmyr [ 03/Sep/11 ]

Eliot: When dropping databases on each shard manually, i assume we would have to make sure the config data about this was distributed properly to mongos? Removing the collection data from the config database and running flushRouterConfig?

Comment by Eliot Horowitz (Inactive) [ 03/Sep/11 ]

If you're going to stay on 1.8.3, this is what I would do:

  • never re-use a database
  • once your'e done with it, do not drop via mongos, but do on each shard individually
    That should be stable
Comment by Theo Hultberg [ 03/Sep/11 ]

Yes, we dropped and probably reused a database (we don't reuse databases intentionally, but because of SERVER-1726 we may since mongos says they still exist). If you think this is the reason, this is related to SERVER-3754 and SERVER-3755

Comment by Eliot Horowitz (Inactive) [ 03/Sep/11 ]

Did you remove and then re-use a database?
There is definitely an issue with dropping a database in 1.8 that might be causing all your issues that is fixed in 2.0

Generated at Thu Feb 08 03:03:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.