-
Type:
Question
-
Resolution: Done
-
Priority:
Major - P3
-
None
-
Affects Version/s: 3.0.2
-
Component/s: MapReduce
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Hi,
I'm doing a map/reduce to collect a large set of log entries of users input. When moving from 2.4 or something to 3.0.2 a lot of problems of too long fields to index appeared. We then cut the large ones down to prefixes and it seemed to work. When doing this map/reduce however I get:
2015-04-29T09:40:46.014+0200 E QUERY Error: map reduce failed:{
"errmsg" : "exception: Btree::insert: key too large to index, failing kostbevakningen.tmp.mr.logentrys_0_inc.$_temp_0 1057 { : \"2 ãƒæ’ã†â€™ãƒâ€ ã¢â‚¬â„¢ãƒæ’ã¢â‚¬â ãƒâ¢ã¢â€šâ¬ã¢â€žâ¢ãƒæ’ã†â€™ãƒâ¢ã¢�...\" }",
"code" : 17280,
"ok" : 0
}
But these fields are less than 100 chars:
> db.logentrys.find({inputText: {$regex: '2 ãƒæ’ã.*'}}).count()
294
> db.logentrys.find({inputText: {$regex: '.
'}}).count()
0
We have a lot of these cases with weird encodings, I think this one is the beginning of swedish "2 ägg" which means "2 eggs" ![]()
My guess is that mongo does some internal tree encoding which makes these unusual characters take up a looot of space so the overhead makes < 100 chars more than 1024 bytes.
What can I do? I could probably go with losing these log entries, but I really don't even know how to identify them all?