[SERVER-1870] Strange dedlock on replica set Created: 29/Sep/10  Updated: 29/May/12  Resolved: 02/Sep/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.6.1, 1.6.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Oleg Lobach Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

mongodb 1.6.3, php_mongo 1.0.7
servers:

  • srv1 = Debian Lenny. kernel 2.6.26-2-amd64 #1 SMP x86_64 GNU/Linux. CPU: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz RAM: 16GB
  • srv2 = Debian Lenny. kernel 2.6.26-2-openvz-amd64 #1 SMP x86_64 GNU/Linux under openvz 3.0.23-1dso1 with no limit. CPU: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz RAM: 16GB

replicaset:

  • master on srv1:27017
  • arbiter on srv1:30001
  • slave on srv2

Attachments: File op.json    
Operating System: Linux
Participants:

 Description   

We have encountered strange behavior of MongoDB during a rapid increase in requests for data retrieval. Record queue and the number of connections begin to grow rapidly. The results of db.currentOp () are attached to the ticket. What else can we check and how to do this? What are the collections "?" in the list of operations for? And what is mean sign "?" in first position of namespace?



 Comments   
Comment by Eliot Horowitz (Inactive) [ 29/Sep/10 ]

you might just be driving the disk/memory crazy

can you try adding an index on {

{ inf : 1 , pos : 1 }

)

Comment by Oleg Lobach [ 29/Sep/10 ]

> Why do you think this is a deadlock?

After the queue increases to several thousands of elements and the limit of connections is used up (16000) we switched srv1 to "slave" mode (with the command rs.stepDown()). srv1 switches to state 3 (according to the command rs.status()) and stays there until the instance is restarted. srv2 becomes master and continues working normally. The queue to srv1 either does not reduce (we waited for more than half an hour), or reduces for 1 element per several minutes.

Apparently, there is some lock, but I can't find the reason why this happens.

> What indexes do you have on the iii_patterns.patterns collection?

db.patterns.stats()
{
"ns" : "iii_patterns.patterns",
"count" : 938573,
"size" : 448177588,
"avgObjSize" : 477.5095682488203,
"storageSize" : 512674816,
"numExtents" : 17,
"nindexes" : 3,
"lastExtentSize" : 95437056,
"paddingFactor" : 1.0099999999996663,
"flags" : 1,
"totalIndexSize" : 142491648,
"indexSizes" :

{ "_id_" : 35241984, "inf_1_id_1_anc_1_cat_1__ix.a_1" : 61456384, "pos_1" : 45793280 }

,
"ok" : 1
}

Comment by Eliot Horowitz (Inactive) [ 29/Sep/10 ]

Why do you think this is a deadlock?
What indexes do you have on the iii_patterns.patterns collection?

Generated at Thu Feb 08 02:58:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.