[SERVER-11796] renaming collection during background index build on replica set spews a lot of "can't unlock b/c of recursive lock" Created: 20/Nov/13  Updated: 11/Jul/16  Resolved: 14/Jan/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: 2.5.5

Type: Bug Priority: Major - P3
Reporter: Michael O'Brien Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: 26qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: ALL
Participants:

 Description   

this happened while i was testing background indexing on mixed version replicasets, although i don't know if the server versions have any relevance to this issue.

repro script:

   setVerboseShell(false)
   var rs1 = new ReplSetTest({nodes:2, name:"test"})
   rs1.startSet({binVersion:"2.4.1"})
   rs1.initiate()
   var pri = rs1.getPrimary()
   for(var i=0;i<1000000;i++){
       pri.getDB("poo").test.insert({"x":i})
   }
 
 
   var new_secondary = rs1.start(3,{binVersion:"2.5.3"})
   var config = rs1.getPrimary().getDB("local").system.replset.findOne()
   config.members.push({_id:config.members.length, host:new_secondary.host})
   config.version++
   rs1.getPrimary().getDB("admin").runCommand({replSetReconfig:config})
   rs1.waitForState(new_secondary, rs1.SECONDARY, 60000);
 
   printjson("MAKING BG INDEX")
   rs1.getPrimary().getDB("poo").test.createIndex({x:1}, {background:true})
   sleep(10)
   print("Renaming collection.")
   rs1.getPrimary().getDB("poo").adminCommand({ renameCollection: "poo.test", to: "poo2.test2", dropTarget: false })
   printjson(rs1.getPrimary().getDB("poo").test.getIndexes())
   printjson(rs1.getPrimary().getDB("poo2").test2.getIndexes())
   while(true){
       sleep(1000)
       printjson("end of test, chillin")
       printjson(rs1.getPrimary().getDB("poo").test.getIndexes())
       printjson(rs1.getPrimary().getDB("poo2").test2.getIndexes())
   }

By the end of the test run, the collection is correctly renamed and the indexes also seem to have been created properly, so I don't think there is any issue of correctness here.
However during the execution of the test, there are hundreds of logged lines like the following:

 m31000| Wed Nov 20 16:05:32.394 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.394 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.394 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.394 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.394 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.394 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.394 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.394 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.395 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.395 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.415 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.415 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.415 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btr^CForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.415 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }
 m31000| Wed Nov 20 16:05:32.415 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1377416, active: true, secs_running: 10, op: "query", ns: "poo2.test2", query: { renameCollection: "poo.test", to: "poo2.test2", dropTarget: false }, client: "127.0.0.1:57663", desc: "conn1", threadId: "0x10e6d5000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, msg: "index: (3/3) btree-middle", numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2 } } }



 Comments   
Comment by Asya Kamsky [ 30/Nov/13 ]

I opened SERVER-11906 which I think would resolve the numerous warnings about recursive lock.

Comment by Michael O'Brien [ 20/Nov/13 ]

I tried a similar test again but this time, using convertToCapped instead of renameCollection.
It does the same kind of thing for a bit (lots of logging about recursive lock) but then finally the 2.5.3 secondary says it's too stale (RS102) and dies.

snip of just the end of the log (after lots of the "can't unlock" lines):

m31000| Wed Nov 20 16:28:10.396 [conn1] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: poo.test top: { opid: 1373272, active: true, secs_running: 27, op: "query", ns: "poo", query: { convertToCapped: "test", size: 209715200.0 }, client: "127.0.0.1:59515", desc: "conn1", threadId: "0x10ddd1000", connectionId: 1, locks: { ^: "W" }, waitingForLock: false, numYields: 0, lockStats: { timeLockedMicros: {}, timeAcquiringMicros: { R: 0, W: 2911 } } }
	m31000| Wed Nov 20 16:28:10.396 [conn1] getmore poo.test cursorid:764630585680412 ntoreturn:0 keyUpdates:0  nreturned:110300 reslen:3639920 895ms
	m31001| Wed Nov 20 16:28:10.744 [conn9] end connection 127.0.0.1:59728 (2 connections now open)
m31001| Wed Nov 20 16:28:10.745 [initandlisten] connection accepted from 127.0.0.1:59756 #11 (3 connections now open)
	m31001| Wed Nov 20 16:28:12.793 [rsBackgroundSync] Socket recv() timeout  127.0.0.1:31000
	m31001| Wed Nov 20 16:28:12.793 [rsBackgroundSync] SocketException: remote: 127.0.0.1:31000 error: 9001 socket exception [3] server [127.0.0.1:31000]
	m27000| 2013-11-20T16:28:12.793-0500 [rsSyncNotifier] Socket recv() timeout  127.0.0.1:31000
	m27000| 2013-11-20T16:28:12.793-0500 [rsSyncNotifier] SocketException: remote: 127.0.0.1:31000 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.0.1:31000]
	m31001| Wed Nov 20 16:28:12.793 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: mikes-MacBook-Pro.local:31000
	m27000| 2013-11-20T16:28:12.794-0500 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: mikes-MacBook-Pro.local:31000
	m27000| 2013-11-20T16:28:12.796-0500 [rsBackgroundSync] Socket recv() timeout  127.0.0.1:31000
	m27000| 2013-11-20T16:28:12.796-0500 [rsBackgroundSync] SocketException: remote: 127.0.0.1:31000 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.0.1:31000]
	m27000| 2013-11-20T16:28:12.796-0500 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: mikes-MacBook-Pro.local:31000
	m27000| 2013-11-20T16:28:12.796-0500 [rsBackgroundSync] replSet syncing to: mikes-MacBook-Pro.local:31000
m31000| Wed Nov 20 16:28:12.797 [initandlisten] connection accepted from 127.0.0.1:59757 #22 (8 connections now open)
	m31000| Wed Nov 20 16:28:13.601 [conn1] command poo.$cmd command: { cloneCollectionAsCapped: "test", toCollection: "tmp.convertToCapped.test", size: 209715200.0, temp: true } ntoreturn:1 keyUpdates:0  reslen:37 30802ms
	m31000| Wed Nov 20 16:28:13.601 [conn1] CMD: drop poo.test
	m31000| Wed Nov 20 16:28:13.616 [conn1] command poo.$cmd command: { drop: "test" } ntoreturn:1 keyUpdates:0  reslen:115 15ms
	m31000| Wed Nov 20 16:28:13.620 [conn1] command admin.$cmd command: { renameCollection: "poo.tmp.convertToCapped.test", to: "poo.test", stayTemp: false } ntoreturn:1 keyUpdates:0  reslen:37 3ms
	m31000| Wed Nov 20 16:28:13.620 [conn1] command poo.$cmd command: { convertToCapped: "test", size: 209715200.0 } ntoreturn:1 keyUpdates:0 locks(micros) W:30822439 reslen:37 30822ms
	m31000| Wed Nov 20 16:28:13.620 [conn15] getMore: cursorid not found local.oplog.rs 732322785552598
	m31000| Wed Nov 20 16:28:13.620 [conn4] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1384982676000|1 } } cursorid:109625080036338 ntoreturn:0 keyUpdates:0 locks(micros) r:25 nreturned:0 reslen:20 30826ms
	m31000| Wed Nov 20 16:28:13.620 [conn19] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1384982829000|3341 } } cursorid:732890635741505 ntoreturn:0 keyUpdates:0 locks(micros) r:25 nreturned:0 reslen:20 30826ms
	m31001| Wed Nov 20 16:28:13.620 [rsSyncNotifier] repl: old cursor isDead, will initiate a new one
	m27000| 2013-11-20T16:28:13.620-0500 [rsBackgroundSync] replSet error RS102 too stale to catch up, at least from mikes-MacBook-Pro.local:31000
m31000| Wed Nov 20 16:28:13.620 [conn22] end connection 127.0.0.1:59757 (7 connections now open)
	m31000| Wed Nov 20 16:28:13.620 [conn15] killcursors: found 0 of 1
m31000| Wed Nov 20 16:28:13.620 [conn15] end connection 127.0.0.1:59717 (6 connections now open)
	m31000| Wed Nov 20 16:28:13.623 [conn4] killcursors: found 0 of 1
m31000| Wed Nov 20 16:28:13.623 [conn4] end connection 127.0.0.1:59549 (5 connections now open)
	m27000| 2013-11-20T16:28:13.620-0500 [rsBackgroundSync] replSet our last optime : Nov 20 16:27:42 528d294e:1
	m27000| 2013-11-20T16:28:13.620-0500 [rsBackgroundSync] replSet oldest at mikes-MacBook-Pro.local:31000 : Nov 20 16:28:04 528d2964:37f1
	m27000| 2013-11-20T16:28:13.620-0500 [rsBackgroundSync] replSet See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
	ReplSetTest Could not call ismaster on node 2: TypeError: Cannot call method 'getDB' of undefined
	m27000| 2013-11-20T16:28:13.620-0500 [rsBackgroundSync] replSet error RS102 too stale to catch up
	m27000| 2013-11-20T16:28:13.620-0500 [rsBackgroundSync] replSet RECOVERING

Generated at Thu Feb 08 03:26:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.