[SERVER-17735] index build failed, ! _progressMeter.isActive() Created: 25/Mar/15 Updated: 08/Apr/15 Resolved: 08/Apr/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.6.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Brewer | Assignee: | Sam Kleinman (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Steps To Reproduce: | I'm not sure that you can repro, as this probably relates to specific DB layout. Here's the setup we're running on though:
|
| Participants: |
| Description |
|
Most of this leadup shouldn't matter, but I'm adding it for completeness. We were running mongo 2.4.10 until recently. So, I removed it from the replica set. Now, when the replica tries to resync, the log has errors all over, similar to those at the bottom of the post and it never finishes syncing. It might make it up to 26% of the 2T disk, but then drops to 6% again. 2015-03-22T06:57:19.526+0000 [FileAllocator] allocating new datafile /db/mongodb/worplay-for-waterlink_meteor_com.0, filling with zeroes... , name: "id", ns: "worplay-for-waterlink_meteor_com.system.users" } , name: "id", ns: "worplay-for-waterlink_meteor_com.system.users" } error: 0 assertion src/mongo/db/curop.cpp:154 Thanks... happy to add more detail if it's needed. |
| Comments |
| Comment by Ramon Fernandez Marina [ 08/Apr/15 ] |
|
Thanks for the update mbrewer. Going forward you may want to consider fixing the long keys, as it's preferable to run with failIndexKeyTooLong set to true to make sure queries that use indices return complete results. |
| Comment by Matthew Brewer [ 08/Apr/15 ] |
|
We've resolved this. Indeed it is the IDs... The way to confirm post 2.6 migration is to turn off ID length checking via the failIndexKeyToLong flag. )" Thanks for the help, and sorry for the bug-report for a non-bug. |
| Comment by Matthew Brewer [ 26/Mar/15 ] |
|
From my bug: Sorry. I'm asking If there is a way to check whether the id issue is the cause retroactively... meaning after we no longer have a 2.4.x instance in the cluster. We now have all 2.6.8, not 2.4 anymore. It would've been nice if I'd captured logs when I ran that command originally, but I did not, and I don't trust human memory enough to say what the real result was. Given that, I was hoping for a way to know if this "id" issue is actually the problem at this point. If I hadn't upgraded auth, I could add 2.4.* machine to the replica-set, just to run this test, but I have upgraded it so that won't work. Is there some other way to make progress on root causing this? |
| Comment by Sam Kleinman (Inactive) [ 26/Mar/15 ] |
|
You can run db.upgradeCheckAllDBs() at any point on a copy of the data managed by 2.4.x to see if there are problems in the data that are still unresolved. Additionally, I'm sorry I missed this earlier but have you run the auth schema upgrade process on this data? See the documentation here for more information on this process. |
| Comment by Matthew Brewer [ 26/Mar/15 ] |
|
Thanks for the help/response! We host users, so the answer to #1 is "I don't know". It's pretty likely that some of our users do generate their own ids though. I did run db.upgradeCheckAllDBs()... but honestly I don't remember exactly what it said. I vaguely recall that it did not return a "no" or a "yes" answer and instead did something nonsensical... but I could easily be wrong. I had previously run it on a test DB on which it had the same behavior for that call, and did work. Is there some way to check whether this is the issue retroactively? |
| Comment by Sam Kleinman (Inactive) [ 25/Mar/15 ] |
|
Thanks for this report. I have a couple of additional questions:
|