[SERVER-4911] Temporary mapreduce collections cause mongodb to exceed namespace limit and breaks replication in replica sets Created: 08/Feb/12  Updated: 08/Mar/13  Resolved: 29/Aug/12

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Shane Andrade Assignee: Kristina Chodorow (Inactive)
Resolution: Incomplete Votes: 0
Labels: mapreduce, namespace, replicaset
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 2008 R2


Attachments: Zip Archive log.zip    
Operating System: ALL
Participants:

 Description   

Temporary mapreduce colelctions are not dropped when they are done being used. Furthermore, it is not possible to drop these collections manually as it just returns "false" without any reason.

This eventually leads to this error on the secondaries in a replica set environment:
error: hashtable namespace index max chain reached:1335

This eventually snowballs and the secondaries are put into recovery mode without warning, leaving only the primary node up and the entire shard inactive.

I realize you can increase the nssize argument at the command line but it would only be a temporary solution. I've attached the logs from one of the secondaries.



 Comments   
Comment by Antoine Girbal [ 30/Apr/12 ]

Shane,
I have not been able to reproduce tmp collections not being dropped by MR, with 2.0.2 or 2.1.0.
This may be due to some special events like a setShardVersion error when establishing the cursor.
Is it still happening for you?
At what rate, e.g. every MR or a few but still piling up?
Assuming here that you use mode replace and that the input collection is sharded.

Comment by Antoine Girbal [ 18/Feb/12 ]

Any occurrence in log of: "Cannot cleanup shard results"?
Also can you give sample names of temporary collections that are not deleted?

Comment by Shane Andrade [ 11/Feb/12 ]

What sort of mapreduce errors would you be referring to? It's possible I've seen them before but I'm wondering if there's a particular type of error that would cause what we are seeing.

Comment by Shane Andrade [ 11/Feb/12 ]

The collections are sharded... Here's an example:

function Map() {
emit(this._jobTitle,

{ Company: this.Company, JobTitle: this.JobTitle, Location: this.Location, AverageSalary: this.AverageSalary * this.SalaryCount, BottomSalary: this.BottomSalary, TopSalary: this.TopSalary, SalaryCount: this.SalaryCount, ResponseCount: this.ResponseCount, ReviewCount: this.ReviewCount, ReviewScore: this.ReviewScore * this.ReviewCount }

);

function Reduce(key, values) {
var agg =

{ Company: null, JobTitle: null, Location: null, AverageSalary: 0.0, BottomSalary: null, TopSalary: null, SalaryCount: 0.0, ResponseCount: 0.0, ReviewCount: 0.0, ReviewScore: 0.0 }

;

values.forEach(function(val)

{ agg.Company = val.Company; agg.JobTitle = val.JobTitle; agg.Location = val.Location; agg.SalaryCount += val.SalaryCount; agg.ResponseCount += val.ResponseCount; agg.ReviewCount += val.ReviewCount; if (val.AverageSalary != null) agg.AverageSalary += val.AverageSalary; if (val.BottomSalary != null && (agg.BottomSalary == null || val.BottomSalary < agg.BottomSalary)) agg.BottomSalary = val.BottomSalary; if (val.TopSalary != null && (agg.TopSalary == null || val.TopSalary > agg.TopSalary)) agg.TopSalary = val.TopSalary; if (val.ReviewScore != null) agg.ReviewScore += val.ReviewScore; }

);

return agg;
}

function Finalize(key, reduced)

{ if (reduced.SalaryCount) reduced.AverageSalary = parseInt(reduced.AverageSalary / parseFloat(reduced.SalaryCount)); else reduced.AverageSalary = null; if (reduced.ReviewCount) reduced.ReviewScore = reduced.ReviewScore / parseFloat(reduced.ReviewCount); else reduced.ReviewScore = null; return reduced; }
Comment by Antoine Girbal [ 09/Feb/12 ]

most likely this is due to the code that restarts the entire mapreduce in case one of the shards encountered an error.
For example a stale shard version can happen fairly frequently.
In that case the 1st MR's temp collections do not get cleaned up.

Comment by Antoine Girbal [ 09/Feb/12 ]

could you give an example mapreduce you are doing?
is it applied to a sharded collection, if so could you give its stats?
do you commonly see mapreduce errors in the logs?

Generated at Thu Feb 08 03:07:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.