[SERVER-20866] Race condition in oplog insert transaction rollback Created: 11/Oct/15 Updated: 08/Feb/16 Resolved: 23/Dec/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 3.0.2 |
| Fix Version/s: | 3.0.9 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Yoni Douek | Assignee: | Geert Bosch |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | QuInt E (01/11/16) | ||||||||
| Participants: | |||||||||
| Description |
|
Issue Status as of Dec 03, 2015 ISSUE SUMMARY USER IMPACT This behavior has been observed when balancing a collection that has large objects and when the chunk migration "source" mongod is under heavy load. WORKAROUNDS Following a crash execute
AFFECTED VERSIONS FIX VERSION Original Description After investigation we found out the the oplog collection is corrupt (validate() returns an error). This is the crash:
|
| Comments |
| Comment by Daniel Pasette (Inactive) [ 30/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
jon@appboy.com, I've added a synopsis of the issue in the description. I can also add that this is a relatively rare issue (two known reports in a release that has been out almost 1 year). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jon Hyman [ 29/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Is there more of a synopsis that someone can provide here? This only happens on collections with "large objects" but how large is that? We're planning on upgrading from 2.6.11 to 3.0.8 next week but I am now reluctant and wondering if I should wait for 3.0.9 given that we add shards all the time. Our max document sizes are around 500kb, but average is around 1kb. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 28/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We're planning to make a release candidate for v3.0.9 which will include this fix in the first couple weeks of January. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 28/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We can approve that this fix works, thanks for this! We noticed that you're working in ~2 month release cycles, and can't afford to wait for 3.0.9. Thanks! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 24/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
That's a good build. There were a few test fixes checked in after the fix you're after, and the latest now contains those fixes too. You can check the commit that the latest build corresponds to in the version string. In this case, the git hash prefix is 7049b7f. The commit stream for the v3.0 branch is here: Let us know how this works for you. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 24/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks dan, none of the nightlies in https://www.mongodb.org/downloads#development return this value, latest we could find is "v3.0.8-11-g7049bf7" - can u point me to the nightly build's address? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 23/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It should be done. If you run mongod --version you should see the mongo version 3.0.9-pre- and git version: 5d07580128edf5bb00f175db054f89c5fefc8b13 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 23/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
BTW - we were constantly seeing this message (every few seconds) - even hours before the crash -
Is this related? any idea why this might happen? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 23/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Amazon Linux | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 23/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The nightly binaries are building now and should be available later today. Which variant are you using? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 23/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Awesome thanks for this! We'll test the nightly build (assuming this is included) and update. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 23/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ryan Farisco [ 23/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
why am i being copied into this ticket still? ? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geert Bosch [ 23/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yes, that is correct, 3.2 is not affected. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 23/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Dan, this is interesting info, if we upgrade to 3.2 - this issue will be resolved? Thanks | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 23/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi ripusingla, we're currently working on a fix for the 3.0 branch now. This issue does not impact the 3.2 branch. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ripu Singla [ 23/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi @Geert Bosch, @Dan Pasette, Can you guys help me find the issue and how to resolve it? Rally appreciate any pointers. Thanks. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ripu Singla [ 23/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi @Ryan Farisco, Sorry for late reply, I didn't add you, looks like you were copied accidentally. Apologies. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ryan Farisco [ 18/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Ripu, we have been copied in accidentally? ? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ripu Singla [ 18/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I see.. Thanks for the info | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 18/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We're restoring the old primary using 12 hour old disk backups (AWS Snapshots), so the new primary only needs to sync the 12 hour changes (takes a few hours). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ripu Singla [ 18/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi @Yoni Douek, Thanks for the heads up, what I do is to stop the primary using "sudo service mongod stop" and secondary automatically takes primary role. But the hard part is to re-sync original primary with new primary, it takes around 2 days as we have roughly ~1TB of data on this replica. Any work around on this? how do you restore data on the corrupted mongo instance? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 18/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ripu, happy to see we're not alone in this! The annoying part of this nasty bug is that the primary doesn't step down. My tip for you is when u see this - manually stop it using
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ripu Singla [ 18/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi @Geert Bosch, @Yoni Douek, we are facing exact same issue, our production server crashed 3rd time last night, Can you let us know the root cause or fix?? We are using 3 replica sets(primary+secondary+arbiter), 3 config servers and 1 query servers, total of 13 machines. We are using Mongo version 3.0.5. Thanks. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 16/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the update Geert, looking forward for this ! : ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geert Bosch [ 16/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Yoni, We believe we've found the root cause of this issue and are working on a fix. -Geert | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 16/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hey, Attached plz find a log of the crash from 3.0.8-rc0 with debug symbols. Ryan - I didn't add you, maybe someone from mongodb did. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ryan Farisco [ 15/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Yoni - I believe we have been copied into the incorrect ticket. ? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 15/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Perfect. We're on it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 15/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
yonido, you can download a 3.0.8-rc0 with debug symbols here (Linux64; other platforms and versions are also available). We can wait until you use these binaries to collect further information, but since it seems you're able to reproduce this issue easily it may help if you sent us a recipe to reproduce on our end, if you have one. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 15/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Reopening. We'll take a look Yoni. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 15/Dec/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hey, We've been investigating this issue very deeply in the last several weeks. We are now 100% sure that this is a bug in mongodb and not anything environmental. To repeat - after running the balancer, when balancing a collection that has large objects (and only in these type of collections) - and when the chunk migration "source" mongod is under stress (heavy load), we get the following:
2. Either
or
3. Then either
or
or both - endlessly. We are using mmapv1. I've also attached another example - this one is better than the previous ones that were previously attached - called "perfect_example". Please re-open the ticket and check this. Since we can now reproduce this quite easily, we'll be happy to help you debug this by sending us a version that contains debugging information. Our cluster crashes every other day and unfortunately we'll have to use another database if this is not solved. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 17/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi yonido, apologies for the delay in getting back to you. Some of the messages in the logs you uploaded point to corrupt BSON:
The most common reason for these errors are failures at the storage layer. If a node is finding corrupt data then you may want to resync it from a healthy node, or restore its data from a backup. The failure to successfully validate() the oplog points in this direction as well. I'd recommend you check the health of the storage layer of the affected node, as well as search its logs for storage-related errors, before you restore this node. If storage in this node is flaky then this problem is likely to reoccur. Since I haven't found evidence of a bug in the server, and we keep the SERVER project is for reporting bugs (or feature suggestions) for the MongoDB server, I'm going to close this ticket. For MongoDB-related support discussion you can post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. See also our Technical Support page for additional support resources. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 21/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hey, Attaching another crash on which this happened. Note the following:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 20/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bumping this, @ramon.fernandez - our cluster is crashing every other day, please help | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 12/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Tara, not sure I understand what you're trying to say ; ) ? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tara Gokey [ 12/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
warning: problem while initially checking shard versions on shard0000 :: caused by :: 10276 DBClientBase::findN: transport error: mongoshard01.vixletinternal.com:27017 ns: admin.$cmd query: { setShardVersion: "", init: true, configdb: "mongocfg01.vixletinternal.com:27019,mongocfg02.vixletinternal.com:27019,mongocfg03.vixletinternal.com:27019", serverID: ObjectId('561b565a55b9ea01b3761314'), authoritative: true }Mon Oct 12 07:46:09.878 [conn3499] warning: db exception when initializing on shard0000:mongoshard01.vixletinternal.com:27017, current connection state is { state: { conn: "", vinfo: "shard0000:mongoshard01.vixletinternal.com:27017", cursor: "(none)", count: 0, done: false }, retryNext: false, init: false, finish: false, errored: false } :: caused by :: 10276 DBClientBase::findN: transport error: mongoshard01.vixletinternal.com:27017 ns: admin.$cmd query: { setShardVersion: "", init: true, configdb: "mongocfg01.vixletinternal.com:27019,mongocfg02.vixletinternal.com:27019,mongocfg03.vixletinternal.com:27019", serverID: ObjectId('561b565a55b9ea01b3761314'), authoritative: true } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 11/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
5. For this situation, like the crash above, the oplog was also corrupt (validate() returned error), whereas all other collections were fine. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoni Douek [ 11/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Attached, the log from primary and secondary on the time of crash. Notes: 1. The setup is primary+secondary+arbiter. 2. This log is from a scenario which is a bit different from the crashlog above. This one shows "Assertion failure isOk() src/mongo/db/storage/mmap_v1/extent.h 77" In our current setup, we saw different manifestations of this crash, already 3 times every day or two. The one attached is one of them - and the other is the one above which shows the crash trace above many many times for every write (unfortunately i don't have full logs for that one). 3. When balancer is stopped, this doesn't happen. 4. When it happens, the process is still alive but all writes fail, maybe this is why the other nodes don't see it as a dead node. But you can see the log of the secondary. Please help, our production servers are very unstable now. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 11/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
yonido, can you please send the full logs for the affected node? We'll also need to see the logs of the other members of this replica set will to understand why a new primary was not elected. Thanks, |