[SERVER-27164] Deadlock during oplog application when implicitly creating multiple collections on the same DB Created: 22/Nov/16 Updated: 06/Dec/22 Resolved: 29/Nov/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.0.0, 3.2.0, 3.4.0-rc4 |
| Fix Version/s: | 3.0.15, 3.2.12, 3.4.1, 3.5.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Robert Guo (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Completed: | |||||||||||||||||
| Sprint: | Repl 2016-12-12 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
It is possible for a node to hang during initial sync or steady state replication when CRUD operations implicitly create collections on the same database (e.g. when inserting a document into a collection that did not exist). The root cause is likely to be that when a collection creation is deemed necessary, each thread applying an operation will try to upgrade an IX lock on the DB to mode X, causing them to deadlock on each other. This issue should have been present in v3.0.0, possibly earlier versions as well. |
| Comments |
| Comment by Githook User [ 16/Feb/17 ] |
|
Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: (cherry picked from commit e876419bebadd6468c402e85e1fcf6eff5a374d4) |
| Comment by Githook User [ 30/Nov/16 ] |
|
Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: (cherry picked from commit e876419bebadd6468c402e85e1fcf6eff5a374d4) |
| Comment by Githook User [ 30/Nov/16 ] |
|
Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: (cherry picked from commit e876419bebadd6468c402e85e1fcf6eff5a374d4) |
| Comment by Githook User [ 29/Nov/16 ] |
|
Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: |
| Comment by Eric Milkie [ 23/Nov/16 ] |
|
It is illegal to upgrade a lock via scoped_ptr reset, as this code is doing. This is because due to the way reset works, the new lock acquisition happens prior to the old lock being released. This is not deadlock-safe. |
| Comment by Robert Guo (Inactive) [ 22/Nov/16 ] |
|
spencer I updated the version numbers for clarification. This deadlock was observed directly on 3.4, but the code didn't change much since 3.0, so it should affect earlier versions as well. |
| Comment by Spencer Brody (Inactive) [ 22/Nov/16 ] |
|
robert.guo, does this also affect newer versions, or just 3.0? |