[SERVER-20531] Mongodb server crash: Invariant failure res.existing Created: 21/Sep/15 Updated: 13/Oct/15 Resolved: 25/Sep/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying, Write Ops |
| Affects Version/s: | 3.0.6 |
| Fix Version/s: | 3.0.7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Peak Ji | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | crash | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
CentOS 6.5 x64 |
||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | Linux | ||||||||
| Sprint: | QuInt A (10/12/15) | ||||||||
| Participants: | |||||||||
| Description |
|
We are running a test server using v3.0.6 with WiredTiger in standalone mode, the docs are like: {"_id":"Article Title", "status":0}There are about 5 million docs in the collection, and we call findAndModify to get the doc and set the "status" field to 1 every second, from two other test servers. After running for about 2 hours, the mongod crashed and we got these errors:
|
| Comments |
| Comment by Githook User [ 25/Sep/15 ] | |||||||||||||||||||||||||||
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: | |||||||||||||||||||||||||||
| Comment by Peak Ji [ 22/Sep/15 ] | |||||||||||||||||||||||||||
|
Great! I'll check out the dev brach. We manually filtered all fields larger than 1024 bytes and everything is working well so far. Peak | |||||||||||||||||||||||||||
| Comment by J Rassi [ 22/Sep/15 ] | |||||||||||||||||||||||||||
|
Thanks so much for the reproducible case. I can confirm that I am able to reproduce this issue by repeating the commands in your shell session against a 3.0.6 mongod instance started with the "--storageEngine=wiredTiger --setParameter failIndexKeyTooLong=false" options. I am also unable to reproduce this issue on the development branch, and suspect that it was fixed in 3.1.2 by Please continue to watch this ticket for updates, and thanks again. ~ Jason Rassi | |||||||||||||||||||||||||||
| Comment by Peak Ji [ 22/Sep/15 ] | |||||||||||||||||||||||||||
|
Seems like the findAndModify command has the limit of 1024 bytes on indexed fields even we are not using that index in the query? | |||||||||||||||||||||||||||
| Comment by Peak Ji [ 22/Sep/15 ] | |||||||||||||||||||||||||||
|
I think I found whats going wrong exactly, it's caused by _ids larger than 1024 bytes. Only two steps needed to reproduce the error using the mongo shell, tested on both Mac OS X and Linux, with Mongodb 3.0.5/3.0.6 WiredTiger:
| |||||||||||||||||||||||||||
| Comment by Peak Ji [ 22/Sep/15 ] | |||||||||||||||||||||||||||
|
Thanks Jason! We've simplified the data model and process now: Database: crawldb Index: {"f": 1}The "f" field is a flag indicating the article status: 0 = unread, 1 = read We use findAndModify to get a "unread" doc (f=0), and mark it as "read" (f=1): db.collection.findAndModify({query: {"f": 0 }, update: {"$set": {"f": 1 }} }) Tested again and got the same errors:
| |||||||||||||||||||||||||||
| Comment by J Rassi [ 21/Sep/15 ] | |||||||||||||||||||||||||||
|
Briefly discussed with david.storch. Our working theory is that findAndModify's call to update() is generating a WriteConflictException, and UpdateStage::work() is handling the WCE by dropping the current snapshot and returning an empty result set. Our recommended next action item on this ticket is to manually inspect the code in UpdateStage::work() in order to help build a reproducible case. Note also that the relevant assertion does not exist in master, so it's possible that this issue affects the v3.0 branch only. Setting fixVersion to "3.1 Required" anyway; we can verify on the master branch once we can reproduce. I'll try to find an assignee at our planning meeting this afternoon. | |||||||||||||||||||||||||||
| Comment by J Rassi [ 21/Sep/15 ] | |||||||||||||||||||||||||||
|
Hi, Sorry to hear that you're encountering this issue. I'd like to ask for additional information to help further diagnose the problem:
Thanks, |