Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17681

Secondary freeze while sync

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • None
    • Replication, WiredTiger
    • None
    • ALL

    Description

      when there is a lot of inserts(about 4Kops/s), it happens randomly, i think there's a deadlock.

      Here's the state:

      rs1|_id:1's cpu usage is 0, and it will sync as expected if I restart it.

      rs1:SECONDARY> rs.status()
      {
      	"set" : "rs1",
      	"date" : ISODate("2015-03-21T02:16:27.794Z"),
      	"myState" : 2,
      	"syncingTo" : "172.16.1.1:39017",
      	"members" : [
      		{
      			"_id" : 0,
      			"name" : "172.16.1.1:39017",
      			"health" : 1,
      			"state" : 1,
      			"stateStr" : "PRIMARY",
      			"uptime" : 247160,
      			"optime" : Timestamp(1426904187, 172),
      			"optimeDate" : ISODate("2015-03-21T02:16:27Z"),
      			"lastHeartbeat" : ISODate("2015-03-21T02:16:27.382Z"),
      			"lastHeartbeatRecv" : ISODate("2015-03-21T02:16:26.859Z"),
      			"pingMs" : 0,
      			"electionTime" : Timestamp(1426657027, 1),
      			"electionDate" : ISODate("2015-03-18T05:37:07Z"),
      			"configVersion" : 1
      		},
      		{
      			"_id" : 1,
      			"name" : "172.16.1.2:39017",
      			"health" : 1,
      			"state" : 2,
      			"stateStr" : "SECONDARY",
      			"uptime" : 247288,
      			"optime" : Timestamp(1426892223, 518),
      			"optimeDate" : ISODate("2015-03-20T22:57:03Z"),
      			"syncingTo" : "172.16.1.1:39017",
      			"configVersion" : 1,
      			"self" : true
      		},
      		{
      			"_id" : 2,
      			"name" : "172.16.1.3:39017",
      			"health" : 1,
      			"state" : 2,
      			"stateStr" : "SECONDARY",
      			"uptime" : 247040,
      			"optime" : Timestamp(1426904187, 169),
      			"optimeDate" : ISODate("2015-03-21T02:16:27Z"),
      			"lastHeartbeat" : ISODate("2015-03-21T02:16:27.375Z"),
      			"lastHeartbeatRecv" : ISODate("2015-03-21T02:16:26.357Z"),
      			"pingMs" : 0,
      			"syncingTo" : "172.16.1.1:39017",
      			"configVersion" : 1
      		}
      	],
      	"ok" : 1
      }
       
      rs1:SECONDARY> db.currentOp()
      {
      	"inprog" : [
      		{
      			"desc" : "repl writer worker 9",
      			"threadId" : "0x3b79480",
      			"opid" : 16975002,
      			"active" : true,
      			"secs_running" : 11995,
      			"microsecs_running" : NumberLong("11995253803"),
      			"op" : "none",
      			"ns" : "device.labels",
      			"query" : {
      				
      			},
      			"numYields" : 0,
      			"locks" : {
      				"Global" : "w",
      				"Database" : "w",
      				"Collection" : "w"
      			},
      			"waitingForLock" : false,
      			"lockStats" : {
      				"Global" : {
      					"acquireCount" : {
      						"w" : NumberLong(1)
      					}
      				},
      				"Database" : {
      					"acquireCount" : {
      						"w" : NumberLong(1)
      					}
      				},
      				"Collection" : {
      					"acquireCount" : {
      						"w" : NumberLong(1)
      					}
      				}
      			}
      		}
      	]
      }
      

      mongod is started by:
      numactl --interleave=all mongod --fork --dbpath /data01/dmp-data/ --logpath /home/hadoop/dmp-data/logs/data1.log --storageEngine wiredTiger --wiredTigerCacheSizeGB 8 --port 39017 --replSet rs1 --shardsvr

      Attachments

        Activity

          People

            Unassigned Unassigned
            talrasha007 Tal Rasha
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: