[SERVER-53366] Delay between user creation / update and access rights being updated in sharded clusters Created: 14/Dec/20 Updated: 15/Mar/21 Resolved: 15/Mar/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Simon Bernier St-Pierre | Assignee: | Eric Sedor |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
Since Mongo 4.4, we are getting spurious errors in our test suite because there is a delay between the creation of a mongo user and the access rights being actually available, but only in sharded clusters. Our test creates a user that has a read only role to a single database and then instantly tries reading from the database using this user. On 4.4 this test fails about 90% of the time. It seems like there is a delay in the propagation of access rights. I tried adding a delay of 10 seconds between the call to create the user and the call that tests the access rights and the issue no longer appears. So it seems like in Mongo 4.4 user creation and/or propagation is asynchronous, but I could not find anything in the docs about this. I tried setting the write concern to all nodes but that does not fix it. |
| Comments |
| Comment by Eric Sedor [ 27/Jan/21 ] | |
|
That sounds right simon.bernier-st-pierre@ubisoft.com about accounting for load balancers. For the assistance working out system behavior in issues like this, we'd like to encourage you to start by asking our community for help by posting on the MongoDB Developer Community Forums. I'm not aware of a direct change to user cacheing but sharding mechanisms have evolved in a few ways over those versions. If you're able to help us reproduce a cache propagation delay and still see a failure on a mongos after running invalidateUserCache we'd want to look into that. Sincerely, | |
| Comment by Simon Bernier St-Pierre [ 15/Jan/21 ] | |
|
Hey, thanks for checking this. It seems like the call to invalidateUserCache does not clear the cache in my case. I added in my tests a simple sleep of 30 seconds (to match userCacheInvalidationIntervalSecs) after every call that modifies users and this has fixed my issue. I only have this problem with 4.2 and 4.4, so perhaps the caching mechanism has been changed between 4.0 and 4.2? I think you may have brought an interesting point however. We have multiple mongos instances which are behind a load balancer. So it's possible that the user cache only gets flushed on one of them, the one we happen to get load balanced to? In that case we need to flush each mongos individually. | |
| Comment by Eric Sedor [ 15/Jan/21 ] | |
|
simon.bernier-st-pierre@ubisoft.com theoretically you would call invlidateUserCache on the mongos you will be authing against with the new/updated user. Unfortunately, I have not been able to reproduce a user creation delay yet. Can you provide clarity on where each command is getting executed by ensuring you have MongoClient objects to each mongos, and are you able to provide the python code that reproduces the OperationFailure? | |
| Comment by Simon Bernier St-Pierre [ 05/Jan/21 ] | |
|
Hi Eric, thanks for the reply. I've done more testing and I have noticed that this issue also occurs on Mongo 4.2. So 3.6 and 4.0 are unaffected and 4.2 and 4.4 are. I've tried adding a call to `invalidateUserCache` right after adding my user but I still get the issue for 4.2 and 4.4 on sharded clusters. So what I'm doing precisely is calling `createUser/updateUser`, then calling `invalidateUserCache` and then creating a connection string for this user and trying to perform legal actions with it. I get an error like this, only on sharded clusters:
Is this `invlidateUserCache` function supposed to be called on mongos or each shard? There's no indication in the docs about that. | |
| Comment by Eric Sedor [ 04/Jan/21 ] | |
|
Hi simon.bernier-st-pierre@ubisoft.com and thanks for your patience, It's not immediately clear to me what would have changed from 4.2 to 4.4 around this, but we'd expect the 30 second default for userCacheInvalidationIntervalSecs is a factor. To avoid a 10 second wait, can you try either reducing userCacheInvalidationIntervalSecs for test purposes, or leveraging the invalidateUserCache command, and let us know if that helps? Eric |