[SERVER-36872] Comment out $sample tests in testshard1.js temporarily Created: 24/Aug/18  Updated: 29/Oct/23  Resolved: 27/Aug/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.1.3

Type: Bug Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: Matthew Saltz (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-36871 $sample can loop infinitely on orphan... Closed
is related to SERVER-37750 Optimized $sample stage does not yield Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2018-09-10
Participants:
Linked BF Score: 0

 Description   

The sample tests for this particular suite are currently causing non-deterministic hangs in Evergreen. We should remove them temporarily pending a fix to SERVER-36871.



 Comments   
Comment by Matthew Saltz (Inactive) [ 22/Oct/18 ]

I confirmed we actually intentionally create empty chunks and distribute them across nodes when we shard a collection with hashed or zoned sharding.

Comment by Matthew Saltz (Inactive) [ 19/Oct/18 ]

There are a lot of tickets related to the behavior change but the behavior change simply surfaced an existing issue by hitting a new scenario in our testing.

I don't think this is what specifically happened in this test, but one way to end up with an empty chunk is if you delete all documents belonging to a range specified by a chunk. The chunk itself doesn't get removed, so a shard can still be targeted even if there are no documents on it.

Alternatively it's also possible to manually split a chunk (even an empty one) so that one of the resulting chunks is empty and can be migrated to a shard with no other chunks from that collection on it. In fact, I believe we sometimes recommend doing this when you create a new sharded collection so that you can spread out your writes to different shards, so you could easily end up with a single empty chunk on a shard in that case.

Comment by Charlie Swanson [ 11/Oct/18 ]

matthew.saltz can you point me to the ticket or tickets which included the change you describe? I am still confused about how we would end up with an empty chunk as the only chunk on a shard, and would like to investigate further to see if this is expected, or can at least be prevented.

Comment by Githook User [ 27/Aug/18 ]

Author:

{'name': 'Matthew Saltz', 'email': 'matthew.saltz@mongodb.com', 'username': 'saltzm'}

Message: SERVER-36872 Comment out $sample from testshard1.js temporarily
Branch: master
https://github.com/mongodb/mongo/commit/5c1ade163b84589061c7050cf35896570781af7d

Comment by Matthew Saltz (Inactive) [ 27/Aug/18 ]

Yes. The autosplitter was moved from triggering splits synchronously on mongos to asynchronously on mongod. So before, the test would do some inserts, block on the autosplitter, which would move a chunk from one shard to the other due to the top chunk optimization, and then continue inserting. Now, the inserts never block, so the splitting and the chunk move happens concurrently with other operations (including $sample). The chunk move causes $sample to try to read documents not owned by that shard, which causes the test to hit SERVER-36871.

Comment by David Storch [ 27/Aug/18 ]

matthew.saltz was there a change which caused this test to hit SERVER-36871? Nothing substantial has changed around $sample in query team code recently, to my knowledge.

Generated at Thu Feb 08 04:44:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.