-
Type: Bug
-
Resolution: Done
-
Priority: Minor - P4
-
Affects Version/s: 2.0.1
-
Component/s: None
-
None
When calling collection.ensure_index(..., unique=True) the index information is immediately cached, which means that if a concurrent thread tries to make the same ensure_index call it will succeed immediately, even though the index might not be there yet.
If both threads start to insert items immediately after calling ensure_index, and there's a very high chance of items having duplicate keys, then it those items might get inserted before the index is created, not only leading to duplicates, but to index creation failure in the first thread.
One example where I'm hitting on this is my file metadata store (there's a unique index on md5-sha1 pair, so that there's only one entry per a unique file). While in production everything works great, in tests (where I check that concurrent metadata addition in my wrappers doesn't have races, etc.) there's always a small chance that right after database is dropped and another test is starting, two concurrent threads will call ensure_index simultaneously and only one of them would actually wait for the index to be made.
This is very minor, and doesn't normally affect production, since index creation is too delicate to rely on ensure_index anyway, especially when you have GBs of data. But in testing it can be very annoying. Took me several days to finally figure out why my tests would so illusively fail sometimes (though that was mostly because I couldn't reproduce the failure when I was actually looking, but when it happened it quickly scrolled past the buffer on Windows and I didn't even know what was the exact failure).