[SERVER-34733] mongos needs to also refresh db version after receiving onImplicitCreate error Created: 27/Apr/18  Updated: 29/Oct/23  Resolved: 14/Feb/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.7.9
Fix Version/s: 4.1.9

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File db_ver.js    
Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

see db_ver.js

Sprint: Sharding 2018-08-13, Sharding 2018-09-10, Sharding 2018-09-24, Sharding 2018-10-08, Sharding 2018-10-22, Sharding 2018-11-05, Sharding 2018-11-19, Sharding 2019-02-25
Participants:
Linked BF Score: 17

 Description   

When a mongos with stale db version tries to send a write to an unsharded collection to the wrong shard, it will trigger the onImplicit create error since the collection does not exist on the shard. It will perform a refresh on the collection version and retry again, but since the database version was not refreshed, it will be sent to the same wrong shard and hit the same error again. This will keep on happening until it ran out of retries.



 Comments   
Comment by Githook User [ 14/Feb/19 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-34733 mongos needs to also refresh db version after receiving onImplicitCreate error
Branch: master
https://github.com/mongodb/mongo/commit/055cc74476a7eedb7d51ccae06c46a9d25b80c5c

Comment by Randolph Tan [ 08/Feb/19 ]

sounds good to me

Comment by Kaloian Manassiev [ 08/Feb/19 ]

This is happening quite a bit (there are 200+ occurrences), so I am marking it as WFBF-day.

renctan, I am thinking if we add a check for CannotImplicitlyCreateCollection at this point (or perhaps inside the targeter itself so we don't cause unit-test errors) and if there is an error, call CatalogCache::invalidateDatabaseEntry(), this should fix the issue. What do you think?

Comment by Kaloian Manassiev [ 09/May/18 ]

I am fine with moving this ticket to 4.1 Required, however we may have to live with the associated BF for a while. I am not sure how hard it is to pinpoint that this is the cause for an incoming BF vs doing this fix. Is it only one test which fails?

Comment by Randolph Tan [ 09/May/18 ]

Note: this won't be needed once we finish the track unsharded collections project which is currently planned late 4.2.

Generated at Thu Feb 08 04:37:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.