[SERVER-5845] Cursor can get deleted because of timeout at the finish stage of a sharded map reduce Created: 15/May/12 Updated: 11/Jul/16 Resolved: 30/Aug/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce |
| Affects Version/s: | 2.1.1 |
| Fix Version/s: | 2.3.0 |
| Type: | Improvement | Priority: | Critical - P2 |
| Reporter: | Azat Khuzhin | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
When I run one of MR job, the second time fail with error:
And by logs, I see that in final stage (when it iterate over "CollectionByWhatMapReduceRun_IntNumber"), it not handle connections
And after - number of connection decrements But another type of MR job is runs okay. Any ideas? |
| Comments |
| Comment by auto [ 30/Aug/12 ] | ||
|
Author: {u'date': u'2012-08-30T15:54:30-07:00', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: | ||
| Comment by Azat Khuzhin [ 06/Jun/12 ] | ||
|
See my pull request I'v add only QueryOption_NoCursorTimeout But need to add yielding ( maybe only if nonAtomic is set? ) | ||
| Comment by Azat Khuzhin [ 06/Jun/12 ] | ||
|
And why you are using upsert, if you already check exist it or not in "State::postProcessCollectionNonAtomic"? | ||
| Comment by Azat Khuzhin [ 06/Jun/12 ] | ||
|
Line mongo/db/commands/mr.cpp:507 (State::postProcessCollectionNonAtomic) must have QueryOption_NoCursorTimeout | ||
| Comment by Azat Khuzhin [ 06/Jun/12 ] | ||
|
It fails at stage Db is locked "lockType" : "W", but I use "nonAtomic" flag, when running mapreduce New clients arrived, but not served, so connections accumulated | ||
| Comment by Azat Khuzhin [ 05/Jun/12 ] | ||
|
Attach log | ||
| Comment by Randolph Tan [ 05/Jun/12 ] | ||
|
It should be fine because the cursor has been initialized with QueryOption_NoCursorTimeout. It would be best to post the entire log, but 3k lines before might be enough. The grep output might be enough too. | ||
| Comment by Azat Khuzhin [ 05/Jun/12 ] | ||
|
If I grep by "conn34700" is it enough? | ||
| Comment by Randolph Tan [ 05/Jun/12 ] | ||
|
Can you post the complete new log? | ||
| Comment by Azat Khuzhin [ 05/Jun/12 ] | ||
|
About cursor already not exist: Maybe because of https://github.com/mongodb/mongo/blob/a105cf0bb3b8bc8aac662441f53deae5e4e83462/src/mongo/db/commands/mr.cpp#L760 And maybe better to yield on every 100mb ? | ||
| Comment by Azat Khuzhin [ 05/Jun/12 ] | ||
|
Now I have the same error (maybe because of data set increase)
And what about data inconsistency? | ||
| Comment by Azat Khuzhin [ 04/Jun/12 ] | ||
|
But in my case no sharding is used | ||
| Comment by Randolph Tan [ 30/May/12 ] | ||
|
I have confirmed that this is a bug and this happens because the cursor used by the shard doesn't use notimeout when querying over the map reduce results of individual shards (when mapreduce.shardedfinish is called). | ||
| Comment by Azat Khuzhin [ 28/May/12 ] | ||
|
BTW if it iterate over "CollectionByWhatMapReduceRun_IntNumber" is it write to "out" collection? | ||
| Comment by Azat Khuzhin [ 25/May/12 ] | ||
|
This cursor create in mapreduce. I can't set noTimeout flag for it.
| ||
| Comment by Randolph Tan [ 24/May/12 ] | ||
|
The server cleans up cursors that has been idle for 10 minutes unless you set the no timeout flag. Assuming that you posted everything for conn178752 between 17:48:35 and 18:24:16, the cursor has been untouched for more than 10 mins. | ||
| Comment by Azat Khuzhin [ 17/May/12 ] | ||
|
I think that it can be because of too many connections. Now I add replicaset and I successfully run this MR job |