I am running a Python script that puts data into a GridFS collection. The GridFS collection exists in a database called "botnet". The script connects to the database through a local MongoS process.
When I run the script I get the following error:
+ python etl_dat_file.py --all
Traceback (most recent call last):
File "etl_dat_file.py", line 150, in <module>
fs = GridFS(db)
File "build/bdist.linux-x86_64/egg/gridfs/_init.py", line 61, in __init_
File "build/bdist.linux-x86_64/egg/pymongo/collection.py", line 813, in ensure_index
File "build/bdist.linux-x86_64/egg/pymongo/collection.py", line 729, in create_index
File "build/bdist.linux-x86_64/egg/pymongo/collection.py", line 315, in insert
File "build/bdist.linux-x86_64/egg/pymongo/connection.py", line 831, in _send_message
File "build/bdist.linux-x86_64/egg/pymongo/connection.py", line 778, in __check_response_to_last_error
pymongo.errors.DuplicateKeyError: E11000 duplicate key error index: harvest.links.$url_1 dup key: { : "http://www.wwe.com/shows/smackdown/sdspecial/fnsd/move" }
The index that is referenced in that stack trace is actually in another database on the same host called "harvest".
The line that appears to be breaking in the GridFS module is this:
if not hasattr(connection, 'is_primary') or connection.is_primary: self.__chunks.ensure_index([("files_id", ASCENDING), ("n", ASCENDING)], unique=True)
line 58 at https://github.com/mongodb/mongo-python-driver/blob/master/gridfs/__init__.py
I have run this script against my local MongoDB database that does not have the "harvest" database and the script executes as expected.
- is related to
-
SERVER-4532 GetLastError on sharded cluster can report incorrect result
- Closed