-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.3
-
Component/s: None
-
None
-
Environment:OS X 10.5, OS X 10.6, OS X 10.7
When using a sharded GridFS with multiple simultaneous writers, the client driver will intermittently fail with "exception: chunks out of order" on the "filemd5" command. The same does not occur in an unsharded environment. This issue occurs randomly, usually after several 100 MBs of data have been written. It appears to corrupt the contents of the target file, but does not affect other files in the system.
The issue occurs in clusters with replication and clusters without replication. Using start_request as was suggested in a similar issue with the C# driver has no effect.
Attached are a script to simulate my sharded cluster setup and a script that should reproduce the failure. If the failure does not occur, increase the number of concurrent writers and/or the number of files written.
Adding the option "partialOk = 1" to the "filemd5" command in grid_file.py seems to resolve the issue. Patch attached although I didn't test it in anything other than my environment.