[SERVER-1766] ERROR: splitIfShould failed: locking namespace failed Created: 09/Sep/10 Updated: 12/Jul/16 Resolved: 10/Sep/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 1.7.0 |
| Fix Version/s: | 1.7.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Alvin Richards (Inactive) | Assignee: | Alvin Richards (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
running the nightly git version: d16ac9d54d9595710ad8288ccdd742d9242a6fc3 |
||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
Problem: Running a bulk insert via a Java program into a 3 shard system. After about 30 minutes I see the following errors in the log file for the router node Thu Sep 9 19:09:15 [conn8] autosplitting scaleout.blogs size: 125479578 shard: ns:scaleout.blogs at: replset0:replset0/10.204.33.94:27000 lastmod: 3|25 min: { ts: -539057490 }max: { ts: -2960867 }on: { ts: -271305523 }(splitThreshold 104857600) max: { ts: -1076104534 }on: { ts: -1343209153 }(splitThreshold 104857600) At the same time, I see my Java clients fail with All mongod's are still running on all machines. Reproduce: Solution: Business Case: |
| Comments |
| Comment by Gilles Gagniard [ 24/Sep/10 ] |
|
Just occured on my 1.6.2 test shard ... the mongos router has been completely stuck for several hours after this splitIfShould failed error message. Killing and restarting it put back the shard in working order. |
| Comment by Che-Ching Wu [ 13/Sep/10 ] |
|
We encountered this also in 1.6.1. |
| Comment by Alvin Richards (Inactive) [ 10/Sep/10 ] |
|
Not seeing the same blocks with the following db version v1.7.1-pre-, pdfile version 4.5 |
| Comment by Eliot Horowitz (Inactive) [ 09/Sep/10 ] |
|
did the remove yield lock change - so lets see if it happens again with that in place |
| Comment by Alvin Richards (Inactive) [ 09/Sep/10 ] |
|
Matias seems to think its stuck on the last changelog entry for moveChunck > db.changelog.find().sort( {time:-1}) { "_id" : "ip-10-204-33-94-2010-09-09T19:10:36-12", "server" : "ip-10-204-33-94", "time" : "Thu Sep 09 2010 12:10:36 GMT-0700 (PDT)", "what" : "moveChunk", "ns" : "scaleout.blogs", "details" : { "min" : { "ts" : -1610580558 }, "max" : { "ts" : -1076104534 }, "from" : "replset0", "to" : "replset1" } } } No entries since this point |
| Comment by Alvin Richards (Inactive) [ 09/Sep/10 ] |
|
Looks like the router is not accepting connections vero:10gen$ ./software/mongodb-osx-x86_64-1.6.0/bin/mongo --port 27500 --host ec2-184-72-193-160.compute-1.amazonaws.com |