[SERVER-11421] Config server does not invalidate the config.chunks cache Created: 28/Oct/13  Updated: 08/Feb/23  Resolved: 30/Oct/13

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.7, 2.5.3
Fix Version/s: 2.4.8, 2.5.3

Type: Bug Priority: Major - P3
Reporter: Alexander Komyagin Assignee: Daniel Pasette (Inactive)
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

2 shards cluster, 3 config servers, 1 mongos


Attachments: File dbhash.js    
Issue Links:
Depends
Operating System: ALL
Steps To Reproduce:
  1. Setup a clean 2 shards cluster, 3 config servers, 1 mongos
  2. on mongos:

    db.test.insert({x:1})
    sh.enableSharding('test')
    sh.shardCollection('test.test',{_id : 1})

  3. Check the dbhash and chunks count on the first config server:

    db.chunks.count()
    db.runCommand({dbhash:1}).collections.chunks

  4. Split a chunk in mongos:

    sh.splitAt("test.test",{_id : new ObjectId()})

  5. Verify on config server that chunks count changed, but not the hash:

    db.chunks.count()
    db.runCommand({dbhash:1}).collections.chunks

Participants:

 Description   
Issue Status as of October 30th, 2013

ISSUE SUMMARY
With SERVER-11021, the dbhash command results for config servers are cached, until new data is written in those collections. The moveChunk/splitChunk commands use the applyOps command to propagate the changes to the config.chunks collection on the config servers. This causes the cached dbhash for the config.chunks collection to not be updated, and afterwards return the old cached dbhash from before the write.

USER IMPACT
This issue is only present in the 2.4.7 (stable release) version of MongoDB. This issue does not affect correctness – the config.chunks collection is written to properly. However, if only one config server is restarted, it can end up with a different dbhash for the config.chunks than the other config servers upon startup. This can prevent new mongos processes from starting until the dbhash for all config servers agree. If the balancer is on, mongos will periodically log a message warning that it has detected that the "config servers differ" and will prevent further migrations from occurring.

SOLUTION
Operations applied to the config server collections with the applyOps command need to call logOpForDbHash to invalidate the dbhash cache.

WORKAROUNDS
It is safe to downgrade only the config servers to 2.4.6 to avoid this cache invalidation problem.

PATCHES
Production release v2.4.8 contains the fix for this issue.



 Comments   
Comment by Volodymyr Gren [ 05/Nov/13 ]

thanks Alex

we actually already did so today

Comment by Alexander Komyagin [ 05/Nov/13 ]

Vovan, while the problems appear only after restarting one of the config servers is restarted, we highly recommend to upgrade to 2.4.8, given it's just a drop-in binary replacement. This is the right way to eliminate potential risk of running into known issues.

-Alex

Comment by Volodymyr Gren [ 05/Nov/13 ]

as I understood we may face this issue only if one of config servers is restarted?

Comment by Alexander Komyagin [ 05/Nov/13 ]

Yes, Vovan,

You can upgrade just config servers to 2.4.8 in order to get the issue fixed.

-Alex

Comment by Volodymyr Gren [ 05/Nov/13 ]

is it enough to upgrade just config servers from 2.4.7 to 2.4.8?

Comment by auto [ 30/Oct/13 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

Message: SERVER-11421: make applyOps call logOpForDbHash
Branch: v2.4
https://github.com/mongodb/mongo/commit/ed7af1195c8364068118be90eb39996f8440fa47

Comment by auto [ 29/Oct/13 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

Message: SERVER-11421: make applyOps call logOpForDbHash
Branch: master
https://github.com/mongodb/mongo/commit/69737f45ccf23fe218c0f8d51c060e8e988551bf

Comment by Daniel Pasette (Inactive) [ 28/Oct/13 ]

This issue does not impact the correctness of the system, but it will force users to restart all config servers if any one of them needs to be restarted.

Comment by Scott Hernandez (Inactive) [ 28/Oct/13 ]

I've attached a test you can test with, or alter as needed.

Generated at Thu Feb 08 03:25:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.