[SERVER-12359] Compacting blocks "admin" db authentication Created: 14/Jan/14 Updated: 09/Jul/16 Resolved: 15/Jan/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 2.4.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kay Agahd | Assignee: | Matt Dannenberg |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
In mongo docs is written that "In MongoDB 2.2, compact blocks activities only for its database." and "You may view the intermediate progress (...) by running the db.currentOp() in another shell instance." That's not true when authentication is enabled because you can't authenticate against the admin database and you can't run db.currentOp() without being authenticated.
This is quite annoying because we can't check the server status while compact is running. |
| Comments |
| Comment by Kay Agahd [ 22/May/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you Eric for your suggestions which will be useful for our next maintenance activities. Thanks also for making the drivers more intelligent in the future. Looking forward to use them asap. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 22/May/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
We're working on making the drivers more intelligent about handling maintenance mode so that after one query error, further queries would not go to that node until it became SECONDARY again. For the current version of MongoDB, it might work better for you to remove nodes from the cluster (but not the config) before running compact on them; the drivers currently handle this situation better and will not direct queries to the node while it's running compact. Be sure to switch the listening port when restarting a node in stand-alone mode while running compact if you choose to do this. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kay Agahd [ 22/May/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Eric, thank you for the detailed explantions. Also, the error message "not master or secondary; cannot currently read from this replSet member" is clear, at least in the shell where the status of the node is shown by "offerstorePL:OTHER>". Our databases are very write heavy. For performance reasons we try to hold all data in RAM. As mongodb is fragmenting data more and more during time, we need to compact regularly. If we don't compact mongodb will notably slow down. Can you suggest us a workaround for this problem? | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 22/May/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
I see now; I was confused when you said "exactly the same blocking behavior". To me, the behavior is quite different (although still problematic for you). Let me see if I can help diagnose this new error message you are receiving. I believe the behavior you are experiencing is documented here: http://docs.mongodb.org/manual/reference/command/replSetMaintenance/ | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kay Agahd [ 22/May/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Eric, thank you for your quick reply. I did not say that my attempt to authenticate is blocked. I'm aware that you have fixed this issue by caching the auth tokens. However, we are still unable to access database A while database B is being compacted. If I executed what I've run a few months ago to reproduce the error:
then I see the following in file id1_s230.txt:
This corresponds to the output in the shell which I have posted some minutes ago. So for me the problem is still present, since we can't access any other database from a node where a collection is being compacted. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 22/May/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Kay, we did test it four months ago and asked you to confirm. I apologize but I don't see how your latest output illustrates that your attempt to authenticate is blocked by a compact running on another database; can you elaborate? It looks like nothing blocks (but you do receive an error). This could be an unrelated issue. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kay Agahd [ 22/May/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Did you ever test the "bugfix"? It still does NOT work!
When will you fix this bug eventually? | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 21/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}Message: | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matt Dannenberg [ 17/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
A-ha. Sorry. I did miss that aspect of your repro. Now that I am reproducing it properly I saw (just as you did) that it is a problem in 2.5.4. However at HEAD of master it seems to have been fixed. I believe this commit fixed the issue. Our nightly builds are only published if the build passes all the tests, so they are not always up to date with the master branch on github. I'm going to attach some more recent executables to this ticket so that you can try your repro again and confirm that the bad behavior has in fact gone away. Please do not use these executables for anything more than testing. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kay Agahd [ 17/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Matt, thank you for your detailed answer but you got me wrong. I'm aware that a collection (or even the whole db) is blocked during compact. However, my last example shows that I'm unable to access a different db (foo.offer) while compact is running on offerStore.offer! Please see my steps above. Can you reproduce it?
| |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matt Dannenberg [ 16/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
It is true (and expected) that you are unable do a find on a collection that is undergoing a compact(). Does the original repro you posted (the one that uses db.currentOp() instead of a find) give you trouble on 2.5.4 or the nightly you downloaded? I'm able to get responses to auth commands and currentOp() from a node undergoing compact in both the nightly and 2.5.4. Another potential source of problems is that you are echoing into the mongo shell instead of using the --eval flag. The difference between the two is that with --eval the shell doesn't do any of the replicaset tracking or state initialization that it does when it is run normally. The interactive shell runs commands to print the prompt and calls getLastError() and other commands behind the scenes, which may be what you as blocking. For more on the differences, take a look at the Differences Between Interactive and Scripted Mongo page in the manual. Try running
rather than
| |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kay Agahd [ 16/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Please see the corresponding logs where you can see that the compact command on db offer was started at 12:13:51. At 12:16:00 the number of offer has been doubled. The compact command was started at 12:16:36. Mongo executed the find on db foo only 22 seconds later at 12:16:58. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kay Agahd [ 16/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Btw. I tested v2.5.4 and 2.5 Nightly (mongodb-linux-x86_64-adb2c9c330f67e24adbb5f1a306f8484befe93a9-2014-01-13), both with same results as 2.4.6 and 2.4.7. I also doubled the number of documents in offer db to increment its compact time. No surprise, the delay time to execute db.offer.find({_id:1}) on db foo doubled too. It executed only when compacting offer db has finished. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kay Agahd [ 16/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Matt, I've tested again with v2.5.4 (this wil become v2.6, right?) to access another DB while compact is running (not admin-db since credentials are cached). It does not work!
So, please reopen and fix this issue. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matt Dannenberg [ 15/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
I ran your repro against 2.4.9 to make sure I was doing it correctly and I saw the same problem you did. Then, I ran it against the HEAD of the master branch on github and the problem was no longer present. We are doing some major reworking of the authentication system between 2.4 and 2.6 and I believe this was fixed by some of that work. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kay Agahd [ 15/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Matt, even if credentials are cached in v2.6, this won't solve the issue because one still can't access other databases while the compact command is running! | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matt Dannenberg [ 15/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
This behavior has been corrected in current master and will be in the 2.6 release. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kay Agahd [ 15/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
I used vvvvv to have more verbose logs which wanted to add here but I'm unable to attach it here so I've uploaded them here: | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kay Agahd [ 15/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Here are the steps I executed to reproduce:
| |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 15/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Please provide the steps you took so we can try to reproduce this. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kay Agahd [ 15/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
Scott, we did not compact the admin database! The current title "Compacting "admin" db blocks authentication" is just wrong and is not what I've initially written, something like "compact blocks more than it should". Why did you change it? It does not matter which database is being compacted. Btw. we've NEVER compacted the admin database. This issue is much more dramatic than you thought because it takes much time to compact our database. During the whole compact time (1-3 hours) we can't access this node. The admin database however is quite small and would be fast to compact, so this wouldn't be a big issue. Please reopen this issue! | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 14/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
We will change the docs to include a note that the local and admin databases are special and should not be compacted like normal databases and if they will effect more than just that database (exclusively). | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 14/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||
|
In 2.6 we are going to cache that info so it should be much less likely but basically we need to be able to read the admin db to auth as a design. | |||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kay Agahd [ 14/Jan/14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||