[SERVER-16616] "chunks out of order" error during md5sum command Created: 20/Dec/14 Updated: 22/Dec/14 Resolved: 22/Dec/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Concurrency |
| Affects Version/s: | 2.8.0-rc3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | Bernie Hackett | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
Just started seeing this on Jenkins starting with 2.8rc3. The test that triggers it writes 100 1-chunk files to GridFS using 10 threads in python. A few of the threads end up throwing this error. All writes are done with w=1 write concern. The test code is here: An example of the failure in Jenkins can be seen here: I've attached the log for this failure. I can't reproduce the problem locally, just in Jenkins. But the failure is pretty consistent in Jenkins. Let me know how to help debug. |
| Comments |
| Comment by J Rassi [ 22/Dec/14 ] | |||||||||
|
Thanks for the report and repro help. I would guess from the attached test failure that f984b532 introduced a bug in IndexScan::saveState(), IndexScan::restoreState(), or IndexScan::invalidate(). Per discussion with schwerin, I'm going to revert the commit and have Dave or Mathias (who co-authored the commit) debug this issue next week when they're back from vacation, as I don't have any spare cycles to look at this until then. Re-opening | |||||||||
| Comment by Daniel Pasette (Inactive) [ 21/Dec/14 ] | |||||||||
|
Found it. Thanks guys.
| |||||||||
| Comment by Daniel Pasette (Inactive) [ 21/Dec/14 ] | |||||||||
|
definite regression in rc3. i'm bisecting with slightly modified version of jeff's script. | |||||||||
| Comment by Jeffrey Yemin [ 21/Dec/14 ] | |||||||||
|
I can reproduce this locally with the attached Java program. I get about 100 failed calls to filemd5 for every 25000 failures. |