[SERVER-58263] Fix absl map erase loop in TenantMigrationAccessBlockerRegistry Created: 03/Jul/21 Updated: 29/Oct/23 Resolved: 06/Jul/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.4, 5.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Billy Donahue | Assignee: | Billy Donahue |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v5.0
|
||||||||||||
| Sprint: | Service Arch 2021-07-12 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
absl hash maps can't ranged-for while erasing elements. |
| Comments |
| Comment by Vivian Ge (Inactive) [ 06/Oct/21 ] | ||||||||||||||||
|
Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! | ||||||||||||||||
| Comment by Githook User [ 29/Sep/21 ] | ||||||||||||||||
|
Author: {'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}Message: (cherry picked from commit efdd9cd309c2e1f6a10762b7dff3913490ea71a3) | ||||||||||||||||
| Comment by Billy Donahue [ 06/Jul/21 ] | ||||||||||||||||
|
Every container documents the iterator invalidation semantics of erase and they're all a little different. In this case, erase is said to invaidate any iterators pointing to the erased element. Other iterators are unaffected. The m.erase(it++) or it=m.erase(it) is old school. It's tricky but it's been around a long time and it's one of those things you just have to read Scott Meyers (or similar) and know about. I wouldn't trust sanitizers to find these problems. I'm not sure a sanitizer can tell when it's violated as it's not necessarily a use of invalid memory. We are relying on absl internal asserts. Maybe we need to invest in C++ training? This isn't a new kind of bug. What's new is Abseil's former tolerance of it, but this would have happened in any container. I'm not sure how to keep it from recurring other than having stronger code reviews. | ||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 06/Jul/21 ] | ||||||||||||||||
|
Thanks for engaging me on this, Billy.
Ahh. So would the following be true then: The proper code to perform this relatively common technique seems trickier than I was expecting. I have an interest in this because a previous case of iterating + removing that I debugged resulted in not actually deleting all the things. We solved that issue by making vector of items to delete, then passing over the vector to actually remove from the map. I'm curious what you think might be a good set of guard rails to more proactively catch these issues in the future? Or do you think that the {A,UB}SAN with the newer versions of abseil will be sufficient to identify these bugs before they hit customers? | ||||||||||||||||
| Comment by Billy Donahue [ 06/Jul/21 ] | ||||||||||||||||
|
The key to this fix is the (it++), so the iterator variable it local to the loop has already advanced when the erase happens. Ranged-for has no way to do this.
| ||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 06/Jul/21 ] | ||||||||||||||||
|
Thanks for the response.
I'm surprised by that. From the patch you made, we're still just removing based on the key (as opposed to calling a remove on the iterator/position). This means to me that a ranged-for loop is not using iterators under the hood? Based on your knowledge, it's not obvious to me why the new code is correct. My experience is with Java where this sort of bug is deterministically caught with a ConcurrentModificationException. Only calling remove on the iterator will succeed, forcing the programmer to forgo the for-each sugar. | ||||||||||||||||
| Comment by Githook User [ 06/Jul/21 ] | ||||||||||||||||
|
Author: {'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}Message: | ||||||||||||||||
| Comment by Billy Donahue [ 06/Jul/21 ] | ||||||||||||||||
I wouldn't phrase it as an either/or, as the behavior is technically undefined whether the container can reliably error out or not I haven't looked at the code up close and I don't know what the salient code difference is between the old and new absl code. I should do that one of these days so we understand what we're really dealing with here. So far I'm just patching in the new Abseil and chasing down stacktraces. A ranged-for loop that does an erase is always a bug in just about any container. I think previous implementations of absl just weren't catching it and were perhaps structured in such a way that it worked anyway. | ||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 06/Jul/21 ] | ||||||||||||||||
|
billy.donahue, the change from the wrong code to the right code is subtle. I'm curious, will the new version of abseil deterministically error out at runtime when a programmer makes this mistake or is the behavior undefined? | ||||||||||||||||
| Comment by Billy Donahue [ 03/Jul/21 ] | ||||||||||||||||
|
Code review: https://mongodbcr.appspot.com/814140004 |