[SERVER-16492] thread gets stuck doing update for a long time Created: 10/Dec/14  Updated: 24/Jan/15  Resolved: 18/Dec/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.8.0-rc2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Asya Kamsky Assignee: Alexander Gorrod
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-16269 WiredTiger blocks queries and updates... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Testing YCSB 50/50 against zlib against replica set seems the easiest way to reproduce this scenario.

While process is being made, one of the writing threads gets stuck on a single update for a long long time:

db.currentOp({secs_running:{$gt:50}})
{
	"inprog" : [
		{
			"desc" : "conn28",
			"threadId" : "0x469a820",
			"connectionId" : 28,
			"opid" : 3896403,
			"active" : true,
			"secs_running" : 408,
			"microsecs_running" : NumberLong(408562540),
			"op" : "update",
			"ns" : "ycsb.usertable",
			"query" : {
				"_id" : "user6628373086844447656"
			},
			"client" : "74.86.233.3:34012",
			"locks" : {
				"Global" : "w",
				"Database" : "w",
				"local" : "w",
				"Collection" : "w",
				"Collection" : "w"
			},
			"waitingForLock" : false,
			"lockStats" : {
 
			}
		}
	]
}

The stack traces (will attach) seem to indicate that they are compressing large pages.



 Comments   
Comment by Michael Cahill (Inactive) [ 18/Dec/14 ]

duplicate of SERVER-16269, focus discussion there.

Comment by Asya Kamsky [ 17/Dec/14 ]

I think this could be related or partly SERVER-16269

Comment by Asya Kamsky [ 10/Dec/14 ]

Here is the latest startup that reproduced at least one thread that ran for over 60 seconds in an update:

{
	"argv" : [
		"mongodb-linux-x86_64-2.8.0-rc2/bin/mongod",
		"-f",
		"configs/myconfig",
		"--wiredTigerCollectionConfig=memory_page_max=10M"
	],
	"parsed" : {
		"config" : "configs/myconfig",
		"processManagement" : {
			"fork" : true
		},
		"replication" : {
			"oplogSizeMB" : 100,
			"replSetName" : "ycsbTest"
		},
		"storage" : {
			"dbPath" : "/disk1/db/wt",
			"engine" : "wiredTiger",
			"journal" : {
				"enabled" : false
			},
			"wiredTiger" : {
				"collectionConfig" : "memory_page_max=10M"
			}
		},
		"systemLog" : {
			"destination" : "file",
			"path" : "/home/asya/logs/wt_zlib.log"
		}
	},
	"ok" : 1
}

I started up new mongod with new db directory and populated it from another host using:

YCSB/bin/ycsb load mongodb -s -P ../workloads/workloada -p fieldcount=1 -p recordcount=30000000 -p mongodb.url=<hostip>:27017 -threads 200 

Then I ran the same workload file (it's 50/50) with equivalent of:

YCSB/bin/ycsb  run mongodb -s -P ../workloads/workloada -p fieldcount=1 -p recordcount=30000000 -p operationcount=10000000 -p mongodb.url=<hostip>:27017 -threads 200 

I ran db.currentOp() periodically and caught one that had been running for longer than 65 seconds (I did grab a full currentOp() output so if there was something in DB blocking it I presume it would be seen there. I've also got sar logs and mongostat for that time period, if it will help.

Comment by Daniel Pasette (Inactive) [ 10/Dec/14 ]

asya, can you include mongod startup options, machine details (which aws instance and storage setup), and precise ycsb cmd for repro?

Generated at Thu Feb 08 03:41:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.