[SERVER-18248] < 100 chars are still too large to index if weird chars or messed up encoding…? Created: 29/Apr/15  Updated: 26/May/15  Resolved: 26/May/15

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: 3.0.2
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Viktor Hedefalk Assignee: Sam Kleinman (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Hi,

I'm doing a map/reduce to collect a large set of log entries of users input. When moving from 2.4 or something to 3.0.2 a lot of problems of too long fields to index appeared. We then cut the large ones down to prefixes and it seemed to work. When doing this map/reduce however I get:

2015-04-29T09:40:46.014+0200 E QUERY Error: map reduce failed:{
"errmsg" : "exception: Btree::insert: key too large to index, failing kostbevakningen.tmp.mr.logentrys_0_inc.$_temp_0 1057 { : \"2 ãƒæ’ã†â€™ãƒâ€ ã¢â‚¬â„¢ãƒæ’ã¢â‚¬â ãƒâ¢ã¢â€šâ¬ã¢â€žâ¢ãƒæ’ã†â€™ãƒâ¢ã¢�...\" }",
"code" : 17280,
"ok" : 0
}

But these fields are less than 100 chars:
> db.logentrys.find({inputText: {$regex: '2 ãƒæ’ã.*'}}).count()
294
> db.logentrys.find({inputText: {$regex: '.

{100,}

'}}).count()
0

We have a lot of these cases with weird encodings, I think this one is the beginning of swedish "2 ägg" which means "2 eggs"

My guess is that mongo does some internal tree encoding which makes these unusual characters take up a looot of space so the overhead makes < 100 chars more than 1024 bytes.

What can I do? I could probably go with losing these log entries, but I really don't even know how to identify them all?



 Comments   
Comment by Sam Kleinman (Inactive) [ 26/May/15 ]

I'm glad that you've been able to resolve this, and sorry for the confusion. I'm going to go ahead and close this ticket. Feel free to reopen if you run into this again or open a new ticket as needed.

Cheers,
sam

Comment by Viktor Hedefalk [ 21/May/15 ]

Hi @Ramon, I could get round it by wiping my mongo installation. Seems like some temps crap staying behind even though I had removed the failing data. "kostbevakningen.tmp.mr.logentrys_0_inc.$_temp_0 1057" sounds like something else than the "real" data.

Comment by Ramon Fernandez Marina [ 21/May/15 ]

Hi hedefalk, we haven't heard back from you for some time. If this is still an issue for you can you please answer Sam's question above about a reproducer?

Thanks,
Ramón.

Comment by Sam Kleinman (Inactive) [ 07/May/15 ]

Hello,

Thanks for reporting this is issue. Can you provide sample data data and/or a small script that we could use to reproduce the issue? This will help us understand the problem much more clearly.

Regards,
sam

Generated at Thu Feb 08 03:47:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.