[SERVER-17735] index build failed, ! _progressMeter.isActive() Created: 25/Mar/15  Updated: 08/Apr/15  Resolved: 08/Apr/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.6.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matthew Brewer Assignee: Sam Kleinman (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Steps To Reproduce:

I'm not sure that you can repro, as this probably relates to specific DB layout.

Here's the setup we're running on though:

  • Mongo 2.6.8
  • ubuntu 14.04 LTS
  • uname -a -> Linux production-db-b1v0 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
  • Running on AWS, m3.large, with 2 1T SSD ebs volumes running raid 0
Participants:

 Description   

Most of this leadup shouldn't matter, but I'm adding it for completeness.

We were running mongo 2.4.10 until recently.
We have 3 mongo DB machines and 2 arbiters running in a replication set.
I upgraded each machine to 2.6.8.
When I went to upgrade the 3'rd Db (after all other machines were upgraded) it ran into some problems, likely related to ulimits etc.

So, I removed it from the replica set.
I upgraded the auth data as per: http://docs.mongodb.org/manual/release-notes/2.6-upgrade-authorization/
I then got a new fresh machine, installed 2.6.8 on it, and tried to resync it.
We ran in to some problems with various limits (ulimits, and total system memory mapping limit), but eventually resolved those.

Now, when the replica tries to resync, the log has errors all over, similar to those at the bottom of the post and it never finishes syncing. It might make it up to 26% of the 2T disk, but then drops to 6% again.

2015-03-22T06:57:19.526+0000 [FileAllocator] allocating new datafile /db/mongodb/worplay-for-waterlink_meteor_com.0, filling with zeroes...
2015-03-22T06:57:19.529+0000 [FileAllocator] done allocating datafile /db/mongodb/worplay-for-waterlink_meteor_com.0, size: 16MB, took 0.002 secs
2015-03-22T06:57:19.535+0000 [rsSync] build index on: worplay-for-waterlink_meteor_com.system.users properties: { v: 1, key:

{ _id: 1 }

, name: "id", ns: "worplay-for-waterlink_meteor_com.system.users" }
2015-03-22T06:57:19.535+0000 [rsSync] building index using bulk method
2015-03-22T06:57:19.535+0000 [rsSync] worplay-for-waterlink_meteor_com Assertion failure ! _progressMeter.isActive() src/mongo/db/curop.cpp 154
2015-03-22T06:57:19.542+0000 [rsSync] worplay-for-waterlink_meteor_com 0x1205431 0x11a7229 0x118b53e 0xa1a460 0x8eea05 0x8f13c2 0x8e70f2 0x8e8d47 0x910a62 0xea0d28 0xea1dad 0xea3177 0xec3844 0xec38f0 0xec3c0a 0x1249dc9 0x7f3e79b1a182 0x7f3e78e1efbd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x1205431]
/usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x11a7229]
/usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0x17e) [0x118b53e]
/usr/bin/mongod(_ZN5mongo5CurOp10setMessageEPKcSsyi+0x1c0) [0xa1a460]
/usr/bin/mongod(_ZN5mongo18addExistingToIndexEPNS_10CollectionEPKNS_15IndexDescriptorEPNS_17IndexAccessMethodEb+0x1a5) [0x8eea05]
/usr/bin/mongod(_ZN5mongo12buildAnIndexEPNS_10CollectionEPNS_17IndexCatalogEntryEb+0x8e2) [0x8f13c2]
/usr/bin/mongod(_ZN5mongo12IndexCatalog11createIndexENS_7BSONObjEbNS0_16ShutdownBehaviorE+0x922) [0x8e70f2]
/usr/bin/mongod(_ZN5mongo12IndexCatalog17ensureHaveIdIndexEv+0x427) [0x8e8d47]
/usr/bin/mongod(_ZN5mongo6Cloner2goERNS_6Client7ContextERKSsRKNS_12CloneOptionsEPSt3setISsSt4lessISsESaISsEERSsPi+0x1ad2) [0x910a62]
/usr/bin/mongod(_ZN5mongo11ReplSetImpl17_initialSyncCloneERNS_6ClonerERKSsRKSt4listISsSaISsEEb+0x318) [0xea0d28]
/usr/bin/mongod(_ZN5mongo11ReplSetImpl12_initialSyncEv+0x7ad) [0xea1dad]
/usr/bin/mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x37) [0xea3177]
/usr/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x124) [0xec3844]
/usr/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x30) [0xec38f0]
/usr/bin/mongod(_ZN5mongo15startSyncThreadEv+0xaa) [0xec3c0a]
/usr/bin/mongod() [0x1249dc9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7f3e79b1a182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f3e78e1efbd]
2015-03-22T06:57:19.542+0000 [rsSync] index build failed. spec: { v: 1, key:

{ _id: 1 }

, name: "id", ns: "worplay-for-waterlink_meteor_com.system.users" } error: 0 assertion src/mongo/db/curop.cpp:154

Thanks... happy to add more detail if it's needed.



 Comments   
Comment by Ramon Fernandez Marina [ 08/Apr/15 ]

Thanks for the update mbrewer. Going forward you may want to consider fixing the long keys, as it's preferable to run with failIndexKeyTooLong set to true to make sure queries that use indices return complete results.

Comment by Matthew Brewer [ 08/Apr/15 ]

We've resolved this. Indeed it is the IDs...

The way to confirm post 2.6 migration is to turn off ID length checking via the failIndexKeyToLong flag.
The command is: "db.getSiblingDB('admin').runCommand(

{ setParameter: 1, failIndexKeyTooLong: false }

)"
We ran this on every machine before a resync, and no longer got this failure (we're still having other issues, but unrelated)

Thanks for the help, and sorry for the bug-report for a non-bug.

Comment by Matthew Brewer [ 26/Mar/15 ]

From my bug:
"I upgraded the auth data as per: http://docs.mongodb.org/manual/release-notes/2.6-upgrade-authorization/"
So... yup, got that one.

Sorry. I'm asking If there is a way to check whether the id issue is the cause retroactively... meaning after we no longer have a 2.4.x instance in the cluster. We now have all 2.6.8, not 2.4 anymore. It would've been nice if I'd captured logs when I ran that command originally, but I did not, and I don't trust human memory enough to say what the real result was. Given that, I was hoping for a way to know if this "id" issue is actually the problem at this point.

If I hadn't upgraded auth, I could add 2.4.* machine to the replica-set, just to run this test, but I have upgraded it so that won't work.

Is there some other way to make progress on root causing this?

Comment by Sam Kleinman (Inactive) [ 26/Mar/15 ]

You can run db.upgradeCheckAllDBs() at any point on a copy of the data managed by 2.4.x to see if there are problems in the data that are still unresolved.

Additionally, I'm sorry I missed this earlier but have you run the auth schema upgrade process on this data? See the documentation here for more information on this process.

Comment by Matthew Brewer [ 26/Mar/15 ]

Thanks for the help/response!

We host users, so the answer to #1 is "I don't know". It's pretty likely that some of our users do generate their own ids though.

I did run db.upgradeCheckAllDBs()... but honestly I don't remember exactly what it said. I vaguely recall that it did not return a "no" or a "yes" answer and instead did something nonsensical... but I could easily be wrong. I had previously run it on a test DB on which it had the same behavior for that call, and did work.
Given the slew of known bugs in 2.4.10 I'm used to tripping over I didn't bother recording or reporting that behavior, sad to say but we're pretty used to mongo not behaving very well. Sorry.

Is there some way to check whether this is the issue retroactively?

Comment by Sam Kleinman (Inactive) [ 25/Mar/15 ]

Thanks for this report. I have a couple of additional questions:

  • Are the values stored in your collection's _id fields generated by your application, or do you use the default ObjectId values. If you use custom values, what values do you use and how do you generate them?
  • Did you run db.upgradeCheckAllDBs() docs before upgrading? What was the outcome of this operation?
Generated at Thu Feb 08 03:45:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.