Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Aggregation Framework, Write Ops
Labels:
- query-44-grooming

Assigned Teams:

Query Execution
Operating System:
ALL
Steps To Reproduce:
Hide

I've first reproduced the error as described in the description in the attached repro_agg_duplicate_key.js. I don't know why, but running this reproduction with

python buildscripts/resmoke.py --suites=sharding_continuous_config_stepdown repro_agg_duplicate_key.js

reproduces the failure more frequently than running with just --suites=sharding. I think possibly I am imagining this.
Show
I've first reproduced the error as described in the description in the attached repro_agg_duplicate_key.js . I don't know why, but running this reproduction with python buildscripts/resmoke.py --suites=sharding_continuous_config_stepdown repro_agg_duplicate_key.js reproduces the failure more frequently than running with just --suites=sharding . I think possibly I am imagining this.
Linked BF Score:
0
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If an $merge fails on one shard, the process of cleaning up another shard's writes can leave some writes cancelled, but still very much alive in the network. Upon any error all other shards participating in an aggregation will be interrupted. Because an interrupted call to AsyncRequestsSender::next() will cancel the callbacks and return without waiting for the response of inserts or updates we've already sent, we can end up returning control to the user while there are still active writes which were generated on their behalf. This is causing problems in some of our tests, but is generally surprising and probably undesirable.

In more detail, consider a sharded cluster which has collections test.input and test.output, both sharded by {_id: 1}. Suppose that test.input is distributed like so:

chunks [0,5), [15, 20) on shard 1
chunk [5, 15) on shard 2

and suppose test.output is distributed like so:

chunk [0, 10) on shard 1
chunk [10, 20) on shard 2

If we had 20 documents in test.input with _ids =[0,20) and we executed

db.input.aggregate([{$merge: {to: "output", mode: "insertDocuments"}}])

then we would end up with each shard sending 5 documents each to itself and the other shard: [0,5) would go from shard 1 to itself, [5,10) would go from shard 2 to shard 1, [10, 15) would go from shard 2 to itself, and [15, 20) would go from shard 1 to shard 2.

Then suppose that when performing the inserts from shard 1, some of them return with an error, maybe a duplicate key. The aggregate on shard 1 will return with an error to mongos. When that error gets back to mongos, mongos will send a killCursors to the aggregate (with the $merge) running on shard 2. Upon receiving that killCursors, the $merge stage may be in a state where it has scheduled both the writes but hasn't heard back from one or both of them. In this scenario, the AsyncRequestsSender will cancel its callbacks as described above and then the aggregate and its cursor will be destroyed and we will respond to mongos and then to the user with the error without verifying that the writes have finished (successfully or not).

In some of our tests, we perform an $merge which is expected to fail after partially completing, then issue a subsequent db.target.remove({}); and expecting that remove to get rid of everything in the collection. Because there can be lingering writes from the $merge, this remove may succeed but there will very shortly be some documents inserted into the collection, causing a subsequent assertion to fail.

I've also attached repro_no_agg.js which demonstrates that an unordered bulk insert can also leave an active insert in the cluster since it also uses the AsyncRequestSender, though this is less surprising and I think not a bug. It is not possible to leave an active operation without being interrupted. Triggering an interruption due to an error like DuplicateKey is unique to the aggregation system.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

repro_agg_duplicate_key.js
2 kB
Jan 04 2019 10:03:06 PM UTC
repro_no_agg.js
2 kB
Jan 04 2019 10:03:17 PM UTC
zombie_killer.patch
15 kB
Aug 22 2019 01:54:58 PM UTC

is related to

SERVER-43198 Zombie writes from failing $merge should not be able to re-create a collection

Backlog

SERVER-43851 Work around zombie writes in $merge tests

Closed

SERVER-80853 $out on secondary node can produce incorrect results if primary steps down

Closed

related to

SERVER-88154 Interrupting $out on sharded cluster can leave temp collections

Closed

Assignee:: [DO NOT USE] Backlog - Query Execution
Reporter:: Charlie Swanson
Participants:: [DO NOT USE] Backlog - Query Execution, Charlie Swanson, Jacob Evans, Kaloian Manassiev, Mira Carey
Votes:: 0 Vote for this issue
Watchers:: 13 Start watching this issue

Created:: Jan 04 2019 10:09:12 PM UTC
Updated:: Sep 12 2024 07:00:37 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates