[SERVER-36737] Aggregating on a collection that is rebuilt using $out will sometimes raise an exception Created: 17/Aug/18  Updated: 27/Oct/23  Resolved: 20/Aug/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 4.0.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mark [X] Assignee: Nick Brewer
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-36918 Searching on a collection that is reb... Closed
Operating System: ALL
Steps To Reproduce:

Run this python code:

 

from pymongo import MongoClient
c=MongoClient('127.0.0.1:27017')
 
 
def bigdict():
	return {str(i):i**2 for i in range(10)}
 
 
 
col1 = c['test']['source']
col2 = c['test']['dest']
 
col1.insert_many([bigdict() for x in range(100000)])
col1.aggregate([
		{
	        "$out": col2.name
	    }
	    ])
print("written to db")
 
def write_to_2():
	col1.aggregate([
		{
	        "$out": col2.name
	    }
	    ])
	print("done writing")
 
from threading import Thread
Thread(target=write_to_2).start()
while True:
	list(col2.aggregate([
		{
			"$sort": {
				"1": 1
			}
		},
		{
			"$project": {
				"3": 1
			}
		},
		{
			"$limit": 100
		}
		], allowDiskUse=True))
	print(".", end='')

 

Participants:

 Description   

Let us have two collections, COL1 and COL2.

Let's assume COL1 has many documents.

If we aggregate COL1 and end it with an $out operation to COL2,

while in parallel we aggregate (over and over again for the purpose of this demonstration) over COL2, at the moment that COL2 is rebuilt (the moment that the aggregation over COL1 with the $out has ended) - the aggregation over COL2 will fail.

 

The exception raised in the code I have given below is:

OperationFailure: Error in $cursor stage :: caused by :: all indexes on collection dropped

 

I have tested this with a mongo 4.0.0 instance inside a docker on a Windows 10 machine.



 Comments   
Comment by Nick Brewer [ 20/Aug/18 ]

Segal I have run the script, and the output it provides actually makes the issue much clearer:

written to db
.......Traceback (most recent call last):
  File "test.py", line 44, in <module>
    "$limit": 100
  File "/usr/local/lib/python3.6/site-packages/pymongo/collection.py", line 2185, in aggregate
    **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pymongo/collection.py", line 2092, in _aggregate
    client=self.__database.client)
  File "/usr/local/lib/python3.6/site-packages/pymongo/pool.py", line 517, in command
    collation=collation)
  File "/usr/local/lib/python3.6/site-packages/pymongo/network.py", line 125, in command
    parse_write_concern_error=parse_write_concern_error)
  File "/usr/local/lib/python3.6/site-packages/pymongo/helpers.py", line 145, in _check_command_response
    raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: query killed during yield: all indexes on collection dropped
done writing

From the placement of the "written to db" and "done writing" messages, we can see that the error on the COL2 aggregation stage occurs before the collection has finished being completely overwritten. This means you're attempting to aggregate a collection while its documents and indexes are being dropped - in this case, it's expected that the aggregation will fail.

As you mentioned, adding retry logic is a good way to work around this issue.

-Nick

Comment by Mark [X] [ 17/Aug/18 ]

I suggest you try to run the example code I've given and see for yourself

Comment by Mark [X] [ 17/Aug/18 ]

More emphasis:

    list(col2.aggregate([
		{
			"$sort": {
				"1": 1
			}
		},
		{
			"$project": {
				"3": 1
			}
		},
		{
			"$limit": 100
		}
		], allowDiskUse=True))

This is the aggregation that fails and raises an exception.

Comment by Mark [X] [ 17/Aug/18 ]

@Nick thanks for the quick reply!

 

The use case in our case is almost exactly as given in the example and has been thought out with great care.

I know and understand that $out will remember the indexes and options - and that is expected and needed by our solution, so that is okay.

However, unless I am missing something, no index or option "changes" in the example I have given. The example I gave has no indexes and no non-default options on both collections, thus this issue (as far as I can tell) has nothing to do with those.

Second, you said  if any index or option changes between the start and end of $out, it will fail.  implying that the $out aggregation will fail. However that is not the case, the $out aggregation does not fail, it's the second aggregation that does!

I expect that if our application does an aggregation on COL2 that aggregation will not raise such an exception, regardless if any other application places an $out aggregation to COL2.

Given that the $out aggregation COL1->COL2 runs once in a while, and that this exception is raised in a small time window right after the $out aggregation, my workaround was to add a retrying mechanism to try again if such an exception is raised.

Comment by Nick Brewer [ 17/Aug/18 ]

Segal I'm curious what use-case you're trying to accomplish here. Given that you're populating a collection and then immediately overwriting it, $out will remember the indexes and collection options from the original collection; if any index or option changes between the start and end of $out, it will fail. This behavior is expected.

Could you clarify what you're expecting this script to accomplish?

-Nick

Comment by Mark [X] [ 17/Aug/18 ]

Actually, this has been found on production, so this happens also on a Linux host.

Generated at Thu Feb 08 04:43:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.