[SERVER-24457] Some commands fail when a shard they need to talk to has no primary, even when they are okay to run on secondaries Created: 07/Jun/16  Updated: 02/Apr/20  Resolved: 02/Apr/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File repro.js    
Issue Links:
Related
related to SERVER-23625 Some read-only operations (eg count,a... Closed
related to SERVER-27625 Remove dead ANSA and setShardVersion ... Closed
Operating System: ALL
Sprint: Sharding 16 (06/24/16), Sharding 2020-04-20
Participants:

 Description   

Some query-like commands (such as distinct) will fail when one of the shards it targets has no primary, even when using 'secondary' read preference.



 Comments   
Comment by Randolph Tan [ 02/Apr/20 ]

No longer an issue in current master

Comment by Spencer Brody (Inactive) [ 06/Jul/16 ]

Reproduced in 3.0 as well.

Comment by Andy Schwerin [ 16/Jun/16 ]

Spencer is going to find out if this is a regression from 3.0, and then bounce it back to needs triage with the additional information.

Comment by Spencer Brody (Inactive) [ 07/Jun/16 ]

My current hypothesis is that this is due to the commands trying to run setShardVersion against the primary (via use of ShardConnection), which they do even when processing a slaveOk operations as an attempt to ensure that the operation is still being routed to the proper shard.

Comment by Spencer Brody (Inactive) [ 07/Jun/16 ]

Attaching a jstest that demonstrates the issue with the 'distinct' command. Weirdly, in my manual testing I was also seeing 'count' and 'aggregate' failing in this case, but the jstest didn't reproduce those failing.

Generated at Thu Feb 08 04:06:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.