[SERVER-47432]  Mongo Server error (MongoQueryException): Query failed with error code 13388 and error message 'Failed to run query after 10 retries :: caused by :: version mismatch detected Created: 09/Apr/20  Updated: 06/Dec/22  Resolved: 13/Apr/20

Status: Closed
Project: Core Server
Component/s: MapReduce, Querying
Affects Version/s: 4.0.17
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Rui Ribeiro Assignee: Backlog - Triage Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-45119 CollectionShardingState::getCurrentSh... Closed
Assigned Teams:
Server Triage
Operating System: ALL
Participants:

 Description   

Hi

After migrating the MongoDB version from 4.0.16 - Feb 5, 2020 to  4.0.17 - Mar 25, 2020. for some collections when I run a Query or a MR I am getting:

Failed to retrieve documents

[M_AUDIT_CDR.AUDIT_KPN@Copy of mongo4.0 [sharded cluster] [direct]] Database error!

Stacktrace:

_/ java.lang.Exception: [M_AUDIT_CDR.AUDIT_KPN@Copy of mongo4.0 [sharded cluster] [direct]] Database error!
____/ Mongo Server error (MongoQueryException): Query failed with error code 13388 and error message 'Failed to run query after 10 retries :: caused by :: version mismatch detected for M_AUDIT_CDR.AUDIT_KPN' on server

Can you please give me a quick solution to fix this, since all my system is stuck because I cannot access several collections.

 

Thank you.

 

Rui



 Comments   
Comment by Carl Champain (Inactive) [ 13/Apr/20 ]

rui.ribeiro@comfone.com, we have a release candidate for 4.0.18-rc0 available now, assuming we do not identify any blocking issues in our final testing, I would expect 4.0.18 GA to be generally available early this week.

shil@mixmax.comns stands for namespace, so you need to replace ns with the namespace of your collection. 

I'm going to close this ticket.

Thanks!
Carl
 

Comment by Shil Sinha [ 11/Apr/20 ]

Hi Carl,

What should ns  be here?

Run db.adminCommand({_flushRoutingTableCacheUpdates: ns, syncFromConfig: true})) against each shard directly

 
Thanks,
Shil

Comment by Rui Ribeiro [ 09/Apr/20 ]

Hi Carl

Thank you for the tips. 

Luckily I found the problem immediately, since this was happening for some random collections.

I already downgraded to 4.0.16.

Will you release the fix in 20.04.2020 (end of your sprint) ?

Cheers

 

Rui

Comment by Carl Champain (Inactive) [ 09/Apr/20 ]

rui.ribeiro@comfone.com,

If you can't downgrade to 4.0.16, the following steps should resolve the issue on 4.0.17:

  1. Stop the balancer
  2. Wait for all the chunk migrations to finish
  3. Run db.adminCommand({_flushRoutingTableCacheUpdates: ns, syncFromConfig: true})) against each shard directly

If the balancer is re-enabled, there is a risk of this problem re-appearing.

Comment by Carl Champain (Inactive) [ 09/Apr/20 ]

rui.ribeiro@comfone.com,

We think you are experiencing SERVER-45119. This issue was fixed in 4.0.18, which will be released soon. In the meantime, please downgrade to 4.0.16.

Comment by Carl Champain (Inactive) [ 09/Apr/20 ]

Hi rui.ribeiro@comfone.com,

Can you please provide the full mongod.log file for this node?
I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Carl
 

Generated at Thu Feb 08 05:14:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.