[SERVER-7447] Sharded legacy-static build goes into uninterruptible sleep on kernel 2.6.16 Created: 23/Oct/12  Updated: 11/Jul/16  Resolved: 28/Oct/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Gregor Macadam Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/iKEK4ZISvSc

These days, i need to scale out my single node mongodb service into a
mongodb cluster.
I use 2 machines to set up a mongodb cluster, it includes 2 shards(mongod
process), 1 config process and 1 mongs process. The details are as the
follow:

step1. start config process
bin/mongod --bind_ip 10.163.138.87 --configsvr --fork --dbpath
/data/mongodb/config --logpath /data/mongodb/log/config.log --logappend --port
20000
step2. start shard services
bin/mongod --bind_ip 10.163.138.87 --fork --shardsvr --port 10000 --dbpath
/data/mongodb/data --logpath /data/mongodb/log/mongod.log --logappend
bin/mongod --bind_ip 10.163.138.88 --fork --shardsvr --port 10000 --dbpath
/data/mongodb/data --logpath /data/mongodb/log/mongod.log --logappend
step3. start mongos service
bin/mongos --bind_ip 10.163.138.87 --port 30000 --fork --configdb
10.163.138.87:20000 --logpath /data/mongodb/log/mongos.log --logappend
step4. log on the mongos and config shards
bin/mongo 10.163.138.87:30000
mongos> use admin
mongos> db.runCommand(

{addshard:"10.163.138.87:10000"}

)

mongos> db.runCommand(

{addshard:"10.163.138.88:10000"}

)

mongos> db.runCommand(

{ "enablesharding" : "query"}

)

mongos> db.runCommand({ "shardcollection" : "query.DailyPcQueryStats",
"key" :{ "d":1, "qpc":1, "ua" : 1}})
step5: use mongoimport to insert data into the collection "
query.DailyPcQueryStats".
But...
The problem ocurrs. when I inserting into collections, the mongd process
turn into a state of 'uninterruptable sleep', and can not recover. Other
processes accessing the same disk also get into the same state, and I can't
even reboot the system with commands. The ONLY thing I can do is to press
the reboot button.

And i test the 2 machines as stand-alone mongodb service. And i almost
import over 300 million data, it's ok.
I wonder why the mongodb cluster perform like this? Where is the problem?

And i test like this: I start all mongodb cluster processes on 1 machine(
10.163.138.88), include 2 shards, 1 config, 1 mongos. And I insert data
into it, it is ok!!! Too strange!
Single node is ok, multi node has problem?



 Comments   
Comment by Stennie Steneker (Inactive) [ 28/Oct/13 ]

Closing as per last comment on Google Groups:

This problem is sloved, and I am sorry to reply too late because I was not able to access google groups these days.
In general, the cause of the problem is the version problem. We didn't use the regular build in the beginning because it couldn't run in our OS(Linux version 2.6.16.60-0.21). Finally, we choose thelegacy-static build as an alternative.
We update the kernel of the linux from 2.6.16 to 2.6.32 and the regular build can run on the new OS.
Now, everything goes well.

Comment by Eliot Horowitz (Inactive) [ 10/Jan/13 ]

The legacy-static build is a bit odd because its statically linked.
Is there a reason you have to run it over the regular?

Generated at Thu Feb 08 03:14:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.