[SERVER-17471] WiredTiger Mutex on Windows can block the server Created: 05/Mar/15  Updated: 14/Apr/15  Resolved: 27/Mar/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.0
Fix Version/s: 3.0.2, 3.1.1

Type: Bug Priority: Major - P3
Reporter: Laurent Dupuis Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:

Occurs on very large (>1 billion rows) bulk inserts.

Seems to happen when cursors attempt to check for cache eviction. However it is difficult to obtain correct stack trace on Release build.

Participants:

 Description   

The function __wt_cond_wait waits a conditional variable for usecs microseconds or for ever if the variable is zero. However, the implementation can lead to unexpected infinite wait if usecs become negative.

	if (usecs > 0) {
		milliseconds = usecs / 1000;
		/*
		 * 0 would mean the CV sleep becomes a TryCV which we do not
		 * want
		 */
		if (milliseconds == 0)
			milliseconds = 1;
		ret = SleepConditionVariableCS(
		    &cond->cond, &cond->mtx, milliseconds);
	} else
		ret = SleepConditionVariableCS(
		    &cond->cond, &cond->mtx, INFINITE);

This can be fixed by reviewing the code and using unsigned values for milliseconds (as it should be a DWORD cf MSDN)

	DWORD milliseconds;
 
	// .... code removed for clarity ....
 
	if (usecs == 0)
	{
		ret = SleepConditionVariableCS(
			&cond->cond, &cond->mtx, INFINITE);
	}
	else //if (usecs > 0) 
	{
		milliseconds = ((unsigned long)usecs) / 1000UL;
		/*
		 * 0 would mean the CV sleep becomes a TryCV which we do not
		 * want
		 */
		if (milliseconds == 0)
			milliseconds = 1;
		ret = SleepConditionVariableCS(
		    &cond->cond, &cond->mtx, milliseconds);
	}



 Comments   
Comment by Michael Cahill (Inactive) [ 27/Mar/15 ]

Resolved with latest drop from WT.

Comment by Laurent Dupuis [ 10/Mar/15 ]

I tested your change and it works on my test project. I don't have any interlock any more. Thanks!

Comment by Mark Benvenuto [ 09/Mar/15 ]

Fix is pending in WiredTiger pull request:
https://github.com/wiredtiger/wiredtiger/pull/1740

Generated at Thu Feb 08 03:44:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.