[SERVER-5894] sharding failed after restarting mongo Created: 22/May/12  Updated: 15/Aug/12  Resolved: 18/Jul/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.1.1
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Azat Khuzhin Assignee: Randolph Tan
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

db version v2.1.1-pre-, pdfile version 4.5
git version: a2d6f752d56aa446220b9f14c8ad3865c2fb5db8


Issue Links:
Duplicate
is duplicated by SERVER-6511 Handle errors gracefully when mongod ... Closed
Operating System: ALL
Participants:

 Description   

Assertion: 15927:can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error

To reproduce do next actions:
0) mongo* running from mongodb:mongodb
1) mkdir /mongo/db/path/moveChunk && chmod 700 /mongo/db/path/moveChunk && sudo chown root:root /mongo/db/path/moveChunk
2) run sharding
it will fail with the next message: ERROR: Uncaught std::exception: boost::filesystem::create_directory: Permission denied: "/mnt/ongo/db/moveChunk", terminating
3) sudo chown mongodb:mongodb /mongo/db/path/moveChunk
4) restart mongodb
And now we have such messages in lock, that it can't open database
5) If you run query to some of collections in this database (e.g. db.foo.findOne()), than this message disappear and migration is in progress



 Comments   
Comment by Randolph Tan [ 18/Jul/12 ]

Hi,

I created a new issue SERVER-6511 for this bug. Thanks for reporting.

Comment by Azat Khuzhin [ 06/Jun/12 ]

And if this directory creates only in "Helpers::removeRange" how it is possible, that you get an error in "addShard" command
"addShard" not doing any migrate

Comment by Randolph Tan [ 05/Jun/12 ]

Search for Helpers::removeRange (defined in src/mongo/db/dbhelpers.cpp) in src/mongo/s/d_migrate.cpp to look for the source that creates the moveChunk directory. The moveChunk directory is created unless you passed --noMoveParanoia.

I have misunderstood your procedure and I'll try it again.

Comment by Azat Khuzhin [ 05/Jun/12 ]

I mean files and lines (e.g. src/mongo/foobar.cpp:100, src/mongo/foobar2.cpp:102), at which "moveChunk" folder is created

In original procedure "addShard" executed, when folder "moveChunk" have not corret grants (0700, and owner not mongodb)
While here I ask you:
1) set right grants for "moveChunk" folder
2) run "addShard"
3) restore previous grants for "moveChunk" folder
4) run origin steps 2-5

Comment by Randolph Tan [ 05/Jun/12 ]

What did you mean by moveChunk create? Are you asking where to find the code for the source (It is actually a multi-stage process and involves a couple of classes)?

Isn't that the same as your original procedure? Am I missing something?

Comment by Azat Khuzhin [ 05/Jun/12 ]

I use git commit a2d6f752d56aa446220b9f14c8ad3865c2fb5db8 for building
Maybe behavior changed?

BTW can't find where "moveChunk" create, could you show me file:line?

And could you try first set right grants for "moveChunk" directory, than run "addShard" command, then restore previous grants for "moveChunk" ?
And then run steps 2-5

Comment by Randolph Tan [ 04/Jun/12 ]

The exact same error message you posted in step 2: ERROR: Uncaught std::exception: boost::filesystem::create_directory.

Comment by Azat Khuzhin [ 04/Jun/12 ]

Why it fails at "addShard" ?
What error message was?

Comment by Randolph Tan [ 29/May/12 ]

When I tried doing your steps, it fails in the addShard right away. And mongo does not retry that operation.

Comment by Azat Khuzhin [ 26/May/12 ]

I mean if I enable sharding, and stop server, after restart, sharding is enabled.
Maybe you mean that moveChunk is not retry on error?

Comment by Randolph Tan [ 25/May/12 ]

No. Mongo doesn't retry on error (only on very special cases, like StaleConfigException).

Comment by Azat Khuzhin [ 25/May/12 ]

No I don't repeat step 2 after step 4, mongo automatically continue sharding after restart (no ?)
By "migration is in progress" I mean that "moveChunk" command successfully started, finished, and so on.

Comment by Randolph Tan [ 24/May/12 ]

Hi,

I tried reproducing this with no success. I have some clarifications regarding you steps. Did you repeat step 2 after step 4? What did you mean by "migration is in progress"?

Comment by Azat Khuzhin [ 22/May/12 ]

I can't, already shutdown ec2 instance (and all data is erased)

Comment by Randolph Tan [ 22/May/12 ]

Can you attach the mongos and mongod logs?

Comment by Azat Khuzhin [ 22/May/12 ]

As I understand:

The thing is that, "mongos" can't open database itself.
It needs to some of clients open it

Comment by Azat Khuzhin [ 22/May/12 ]

Hi,

0) it means that you run mongodb from user mongodb, group mongodb (debian /etc/init.d/mongodb start)
By default it runs like this
2)

sh.addShard(...);
....
sh.addShard(...);
 
sh.enableSharding("database");
sh.shardCollection("database.collection");

Comment by Randolph Tan [ 22/May/12 ]

Hi,

Would you mind clarifying some of the steps?

For example, I don't understand what step 0 and step 2 means.

Generated at Thu Feb 08 03:10:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.