-
Type:
Question
-
Resolution: Done
-
Priority:
Major - P3
-
None
-
Affects Version/s: 3.0.2
-
Component/s: MapReduce
-
None
Hi,
I'm doing a map/reduce to collect a large set of log entries of users input. When moving from 2.4 or something to 3.0.2 a lot of problems of too long fields to index appeared. We then cut the large ones down to prefixes and it seemed to work. When doing this map/reduce however I get:
2015-04-29T09:40:46.014+0200 E QUERY Error: map reduce failed:{
"errmsg" : "exception: Btree::insert: key too large to index, failing kostbevakningen.tmp.mr.logentrys_0_inc.$_temp_0 1057 { : \"2 ãƒæ’ã†â€™ãƒâ€ ã¢â‚¬â„¢ãƒæ’ã¢â‚¬â ãƒâ¢ã¢â€šâ¬ã¢â€žâ¢ãƒæ’ã†â€™ãƒâ¢ã¢�...\" }",
"code" : 17280,
"ok" : 0
}
But these fields are less than 100 chars:
> db.logentrys.find({inputText: {$regex: '2 ãƒæ’ã.*'}}).count()
294
> db.logentrys.find({inputText: {$regex: '.
'}}).count()
0
We have a lot of these cases with weird encodings, I think this one is the beginning of swedish "2 ägg" which means "2 eggs"
My guess is that mongo does some internal tree encoding which makes these unusual characters take up a looot of space so the overhead makes < 100 chars more than 1024 bytes.
What can I do? I could probably go with losing these log entries, but I really don't even know how to identify them all?