[CDRIVER-2657] Coverity analysis defect 103440: LOCK Created: 21/May/18 Updated: 28/Oct/23 Resolved: 05/Jul/18 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 1.12.0 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Coverity Collector User | Assignee: | Roberto Sanchez |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Epic Link: | Resolve Coverity Warnings |
| Description |
|
LOCK Defect 103440 (STATIC_C)
|
| Comments |
| Comment by Roberto Sanchez [ 08/Jul/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the confirmation. I went ahead and marked it as a false positive. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by A. Jesse Jiryu Davis [ 08/Jul/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
That's exactly right, Coverity is worried about a sleep() instigated by test_topology_scanner_blocking_initiator to simulate a slow connection. We've marked this warning as a false positive before. It seems that your code changes altered the scenario enough that Coverity doesn't recognize that we've suppressed this warning already. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roberto Sanchez [ 08/Jul/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
jesse, kevin.albertson, it looks like the change I made resulted in a new Coverity defect: http://coverity.mongodb.com/query/defects.htm?project=C+Driver&cid=103668 It looks like the real culprit is test_topology_scanner_blocking_initiator() in test-mongoc-topology-scanner.c. The question I have is whether this should be fixed (e.g., perhaps by changing the mechanics of the lock/unlock behavior), or whether we would consider this an artifact of the test (e.g., the sleep call is to simulate some sort of delay that would be encountered in the real world). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 05/Jul/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'username': 'rcsanchez97', 'name': 'Roberto C. Sánchez', 'email': 'roberto@connexer.com'}Message: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roberto Sanchez [ 29/Jun/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
And after further investigation I found that Coverity first detected this defect on March 24th, or about 6 weeks before the change I identified, not in mid-May as I had originally thought. That change is clearly not the cause. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roberto Sanchez [ 29/Jun/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
My apologies for not communicating myself more clearly. Prior to your change if the mutex was locked on entry to mongoc_topology_select_server_id then the call to _mongoc_topology_do_blocking_scan, which then results in a call to mongoc_topology_scan_once does not look like it would cause the double unlock. However, after your change, a mutex that was locked on entry would be left unlocked prior to the call to _mongoc_topology_do_blocking_scan and then result in a double unlock. It seems the question I should have asked was "is it assumed (or possible) to enter mongoc_topology_select_server_id with the mutex locked?" Either way, I will take another closer look, as it is entirely likely that I am just confused about the proper flow. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kevin Albertson [ 29/Jun/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
That change you're referring to adds both a lock/unlock:
It didn't lock here before, so I don't think this is where the double unlock was introduced. But then again, if coverity just found this recently, then maybe I'm missing something. I think we should lock the mutex in _mongoc_topology_do_blocking_scan as Jesse suggested. I also agree with Jesse's suggestion that mongoc_topology_scan_once shouldn't assume the mutex is locked, then leave it in an unlocked state, unless there's a really good reason. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by A. Jesse Jiryu Davis [ 28/Jun/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I agree. Let's document that assumption in a comment block. Additionally, scan_once looks buggy to me: it assumes the mutex will be locked when we enter scan_once, and then scan_once unlocks the mutex. I think that function should restore the mutex to its locked state before returning. Let's do that and check if scan_once's callers need to change in response. Finally, in order to obey scan_once's assumption that the mutex is locked, I think we should lock around the scan_once call in _mongoc_topology_do_blocking_scan: that's in single-threaded mode, so it's unnecessary but harmless to lock there for the sake of a consistent state when entering scan_once. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roberto Sanchez [ 27/Jun/18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
jesse, I have done some analysis of this ticket and here are my findings. First, the issue appears to be that mongoc_topology_scan_once makes an unconditional call to mongoc_mutex_unlock:
That function, mongoc_topology_scan_once, is called from only two places: _mongoc_topology_do_blocking_scan and _mongoc_topology_run_background. There does not appear to be any locking in _mongoc_topology_do_blocking_scan, while there is locking in _mongoc_topology_run_background. Commit bf80fd809c (
It would appear that the addition of mongoc_mutex_unlock in that change resulted in the double unlock behavior. Back over in _mongoc_topology_run_background there is this bit near the top:
The end of loop and function look like this:
It appears to me, based on the code in _mongoc_topology_run_background being older, that the expectation is for the mutex to be locked on entry to mongoc_topology_scan_once. However, the change to mongoc_topology_select_server_id in commit bf80fd809c breaks that assumption. Would it be possible to remove the call to mongoc_mutex_unlock outside of the if statement? kevin.albertson, do you think that would break the change you made? As an additional note, it appears that I require a Synopsys login so that I can view the Coverity defect report. Also, is there a documented process for reproducing the checks locally or via patch builds? That would allow me to experiment with making tweaks to determine if they fix the problem and also identify regressions that might arise. |