[SERVER-27164] Deadlock during oplog application when implicitly creating multiple collections on the same DB Created: 22/Nov/16  Updated: 06/Dec/22  Resolved: 29/Nov/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.0.0, 3.2.0, 3.4.0-rc4
Fix Version/s: 3.0.15, 3.2.12, 3.4.1, 3.5.1

Type: Bug Priority: Major - P3
Reporter: Robert Guo (Inactive) Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: code-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-27205 Remove implicit collection creation f... Closed
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Repl 2016-12-12
Participants:
Linked BF Score: 0

 Description   

It is possible for a node to hang during initial sync or steady state replication when CRUD operations implicitly create collections on the same database (e.g. when inserting a document into a collection that did not exist).

The root cause is likely to be that when a collection creation is deemed necessary, each thread applying an operation will try to upgrade an IX lock on the DB to mode X, causing them to deadlock on each other.

This issue should have been present in v3.0.0, possibly earlier versions as well.



 Comments   
Comment by Githook User [ 16/Feb/17 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-27164 do not upgrade dblock for replication, to avoid deadlock

(cherry picked from commit e876419bebadd6468c402e85e1fcf6eff5a374d4)
Branch: v3.0
https://github.com/mongodb/mongo/commit/3105cafb4d712deb5cef0b6b12f137b14d2dee33

Comment by Githook User [ 30/Nov/16 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-27164 do not upgrade dblock for replication, to avoid deadlock

(cherry picked from commit e876419bebadd6468c402e85e1fcf6eff5a374d4)
Branch: v3.2
https://github.com/mongodb/mongo/commit/e0c00e3eacceadeb3d08e5afaec3135615ac2f71

Comment by Githook User [ 30/Nov/16 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-27164 do not upgrade dblock for replication, to avoid deadlock

(cherry picked from commit e876419bebadd6468c402e85e1fcf6eff5a374d4)
Branch: v3.4
https://github.com/mongodb/mongo/commit/b144701c0c567ba15e7261a762b66fdee641eaed

Comment by Githook User [ 29/Nov/16 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-27164 do not upgrade dblock for replication, to avoid deadlock
Branch: master
https://github.com/mongodb/mongo/commit/e876419bebadd6468c402e85e1fcf6eff5a374d4

Comment by Eric Milkie [ 23/Nov/16 ]

It is illegal to upgrade a lock via scoped_ptr reset, as this code is doing. This is because due to the way reset works, the new lock acquisition happens prior to the old lock being released. This is not deadlock-safe.

Comment by Robert Guo (Inactive) [ 22/Nov/16 ]

spencer I updated the version numbers for clarification. This deadlock was observed directly on 3.4, but the code didn't change much since 3.0, so it should affect earlier versions as well.

Comment by Spencer Brody (Inactive) [ 22/Nov/16 ]

robert.guo, does this also affect newer versions, or just 3.0?

Generated at Thu Feb 08 04:14:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.