[SERVER-28516] index corruption Created: 27/Mar/17 Updated: 18/Apr/17 Resolved: 29/Mar/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.4.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | Brian Nelson | Assignee: | Eric Milkie |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | uic | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
CentOS 6.8 |
||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
We're experiencing regular index corruption on a table with a partialIndexFilter, though it is not always that index getting corrupted. The only relevant thing I could find near the time when the corruption is occurring is log rotation via SIGUSR1. Nothing else in the log file seemed suspect. Unfortunately our data is very sensitive and cannot be shared. I can tell you we're using Ruby/Rails and the mongoid_paranoia gem and set up a unique partialFilterExpression with:
The corruption appears to have only been exposed with this added but may have happened before and gone unnoticed. We found it when being unable to find documents with this index using
|
| Comments |
| Comment by Eric Milkie [ 29/Mar/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for your report! Please follow the linked ticket for further updates. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 29/Mar/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
This bug does not look like it has an easy fix, so I expect it will take more than one week. After we have coded up a fix, we can expedite a minor version release and have that ready in two or three weeks after that. It's our top priority right now. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brian Nelson [ 29/Mar/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks! Looking forward to the fix. Is there an estimation as to how long this typically takes before there is a release with a fix? | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 29/Mar/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
We've discovered a problem in the code when you use WiredTiger and have a unique index with a partialFilterExpression. If you delete a document that is not matched by the partialFilterExpression, all documents with matching keys for that particular index will be removed from the index. Here is an example using your index spec provided in the Description above:
We are actively working on a solution for this. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brian Nelson [ 28/Mar/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
No, it worked just fine with the reIndex. The names don't get updated very frequently and we check them elsewhere in the code, the index is a fallback for certain cases. But there were no failures on the reIndex. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 28/Mar/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Do you run a rebuild by executing the "reIndex" command? If so, does it ever detect records that would violate the unique constraint? I would expect the reindex command to fail at that point, if it did. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brian Nelson [ 27/Mar/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Additionally, running a rebuild of the indexes seems to resolve the issue temporarily, though it keeps happening. Also this only seems to affect the primary. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brian Nelson [ 27/Mar/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
I'll work on getting all the information to you, but in the meantime:
Total and in index counts:
initandlisten / signalProcessingThread
| ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 27/Mar/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Brian,
| ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brian Nelson [ 27/Mar/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Sorry meant "index corruption" and "collection" not "table". Cannot seem to edit this issue. |