[SERVER-31546] Write lock on one database causes lock on ALL databases Created: 13/Oct/17 Updated: 16/Oct/17 Resolved: 13/Oct/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Replication, WiredTiger |
| Affects Version/s: | 3.2.17 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Mario | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Participants: | |||||||||
| Description |
|
Hi, We are using Mongo replica-set (3 Mongo servers - 1 Primary, 1 secundary, 1 arbiter). We had a problem on 10.10.2017 between 11:44:10 - 11:51:00 hours. We know that we started dropIndex on one collection and commad executed ok: After that, all operation on+ all databases were very slow+ (about 5~6 minutes) , number of connections started to get very high and logs are full of All our application were slowed in query execution, insert too. Based on documentation https://docs.mongodb.com/manual/reference/command/dropIndexes/index.html only the affected database shoud be lock, but not all others and we think this happend. In attachment are logs. We can't explain our self what happend so asking for your help? |
| Comments |
| Comment by Ramon Fernandez Marina [ 16/Oct/17 ] | ||||||||
|
mnuic, Regards, | ||||||||
| Comment by Mario [ 13/Oct/17 ] | ||||||||
|
Do you have any estimated time for resolving ticket 21307? Thanks for everything Bruce, I will watch the ticket Best, | ||||||||
| Comment by Bruce Lucas (Inactive) [ 13/Oct/17 ] | ||||||||
|
Hi Mario, Thanks for uploading the additional information. This confirms that you've encountered On the primary we see an index build from 08:45 to 09:17, and then the problematic dropIndexes command on the same collection at 09:44:
On the secondary that index build begins at 09:17 when the index build has finished on the primary, and it is still running when the dropIndexes command occurs at 09:44, setting up the conditions for
Sorry you encountered this issue, and thanks for reporting it. I'll close this ticket as a duplicate of To avoid this problem until it is fixed, you can avoid doing dropIndexes commands until createIndexes commands on the same collection have finished on all nodes. Bruce | ||||||||
| Comment by Mario [ 13/Oct/17 ] | ||||||||
|
Hi Bruce, Logs from mongo-0 are in attachment. Can you suggest us what to do to avoid this in the future? Thanks, | ||||||||
| Comment by Bruce Lucas (Inactive) [ 13/Oct/17 ] | ||||||||
|
Hi Mario, Thanks for the additional information. It looks like, as you suspected, the slowness is related to the dropIndexes commands, which we can see at A below:
However the mechanism for the slowdown was not locking, but rather replica lag and w:majority writes. We see member _id 0 (the mongo-0 node) stop replicating and its lag build over the course of almost 7 minutes until B, at which point it catches up very quickly. At that time a number of very long running requests finish reporting running times up to almost 7 minutes ("logged slowest query durations"), and they report an average wtime (replication wait time) of about the same. The log confirms that those writes specified w:majority, so could not complete until the secondary was caught up. I suspect that the lag on the secondary relates to the replicated dropIndex command, but I'm not sure why that occurred. If you can attach the mongod log file for that time period and the entire diagnostic.data directory for the mongo-0 node we may be able to see why this occurred. Thanks, | ||||||||
| Comment by Mario [ 13/Oct/17 ] | ||||||||
|
Hi Bruce, Sorry not mentioning the timezone, it's -2 hour. Attached entire diagnostic.data directory Thanks, | ||||||||
| Comment by Bruce Lucas (Inactive) [ 13/Oct/17 ] | ||||||||
|
Hi Mario, One additional request - can you clarify what is the timezone for the following?
Thanks, | ||||||||
| Comment by Bruce Lucas (Inactive) [ 13/Oct/17 ] | ||||||||
|
Hi Mario, Thanks for attaching the logs and the metrics files. However the attached metrics files don't cover the time period you mention. Can you please archive and attach to this ticket the entire content of the diagnostic.data directory so that we can investigate? Note that this request is somewhat time critical as the data retention for diagnostic.data is typically a few days. Thanks, |