Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.0.2
Component/s: MapReduce
Labels:
None

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hi,

I'm doing a map/reduce to collect a large set of log entries of users input. When moving from 2.4 or something to 3.0.2 a lot of problems of too long fields to index appeared. We then cut the large ones down to prefixes and it seemed to work. When doing this map/reduce however I get:

2015-04-29T09:40:46.014+0200 E QUERY Error: map reduce failed:{
"errmsg" : "exception: Btree::insert: key too large to index, failing kostbevakningen.tmp.mr.logentrys_0_inc.$_temp_0 1057 { : \"2 ãƒæ’ã†â€™ãƒâ€ ã¢â‚¬â„¢ãƒæ’ã¢â‚¬â ãƒâ¢ã¢â€šâ¬ã¢â€žâ¢ãƒæ’ã†â€™ãƒâ¢ã¢�...\" }",
"code" : 17280,
"ok" : 0
}

But these fields are less than 100 chars:
> db.logentrys.find({inputText: {$regex: '2 ãƒæ’ã.*'}}).count()
294
> db.logentrys.find({inputText: {$regex: '.

{100,}

'}}).count()
0

We have a lot of these cases with weird encodings, I think this one is the beginning of swedish "2 ägg" which means "2 eggs"

My guess is that mongo does some internal tree encoding which makes these unusual characters take up a looot of space so the overhead makes < 100 chars more than 1024 bytes.

What can I do? I could probably go with losing these log entries, but I really don't even know how to identify them all?

Assignee:: Sam Kleinman (Inactive)
Reporter:: Viktor Hedefalk
Participants:: Ramon Fernandez Marina, Sam Kleinman, Viktor Hedefalk
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Apr 29 2015 07:53:26 AM UTC
Updated:: May 26 2015 04:09:58 PM UTC
Resolved:: May 26 2015 04:09:58 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates