[SERVER-79075] Generated StringData fields from IDL structs may refer to invalid bytes after command returns Created: 18/Jul/23 Updated: 27/Oct/23 Resolved: 19/Jul/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Aiden Szeto (Inactive) | Assignee: | Backlog - Service Architecture |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Service Arch
|
| Participants: |
| Description |
|
When type: string is specified in a struct field in IDL, the generated type is StringData. StringData does not own the data it points to, and as a result the data it points to may outlive the StringData object itself. This can be problematic if the data is freed, which would cause the StringData object to point to garbage memory and may lead to a segfault. This particularly happens when an IDL struct is used for a command's parameters. When the command returns, the parameters are destructed causing the StringData object to point to garbage. Here's an example of this happening. In this case, a workaround was made by creating an object that holds a std::string in IDL. However, a more direct fix could be achieved if the type of a struct field in IDL could be a std::string rather than StringData. |
| Comments |
| Comment by Billy Donahue [ 19/Jul/23 ] | |
|
Tiny thing to follow up on George's last sentence. I'd recommend exploiting the explicit conversion operator from StringData to std::string instead of using the name function StringData.toString().
This will work whether str is a StringData, a char*, a std::string_view, or a std::string. | |
| Comment by Aiden Szeto (Inactive) [ 19/Jul/23 ] | |
|
Thanks for the clarification George! Yes, the issue is that another variable is saving StringData that it gets from the IDL structure getters that outlives the IDL structure itself. I do have a workaround that copies the string out of the IDL structure. Just to be clear, the IDL structure owns a std::string type but the generated getters return a StringData type in order to preserve the nature of IDL types being "just a view"? Is there any way to access the owned string type itself or is it best practice to use the StringData that views into it? | |
| Comment by George Wangensteen [ 19/Jul/23 ] | |
|
Actually, max.hirschhorn@mongodb.com reminded me that IDL structs do take ownership of things of type string in particular. That is, they do copy strings out of the underlying BSONObj at parse-time into the data of the IDL-structure itself. However, the getters they produce have a return value of type StringData. StringData is a view-type, which means that it doesn't own the underlying memory. So even though the IDL-structure itself owns the string, when you do:
the variable "view" will point to garbage memory after "myIDLStruct" is destroyed, because "myIDLStruct" is the object that owns the underlying memory. So in this particular case of type "string", because the struct actually does own the memory, my suggestsions above about using parseSharingOwnership or parseOwned will not help. Instead, the error is likely that another variable you are using (or object) is saving a StringData that it gets from the IDL structure getters that outlives the IDL structure itself. To resolve this, you'll need to either copy the string out of the IDL structure (using StringData::toString or similar), or extend the IDL structures lifetime. Again, feel free to reach out on slack if you have questions! | |
| Comment by George Wangensteen [ 19/Jul/23 ] | |
|
You can make your IDL struct share or take-over ownership of the underlying data by using parseSharingOwnership or parseOwned static member functions of your struct. Have you tried that, aiden.szeto@mongodb.com ? Generally/historically speaking, IDL structs are intended to provide a well-typed view over a BSONObj, not copy data out of the BSONObj. The choice to act as a 'view' and not a copy is to avoid the performance penalty of copying the data out of the BSONObj, and the programming-model/expectation is that the programmer ensure that the underlying BSONObj has it's lifetime extended as needed/outlives the IDL structure. If you need to destroy the BSONObj before you are done using the IDL struct, you need to either copy the data out of the underlying BSONObj or extend the lifetime of the underlying buffer where the data is stored. To do this, you can use one of the functions I suggested above. If this doesn't work for you, let me know - maybe you need some sort of "string_owned" IDL type that copies just the string out of the underlying object but lets the rest get freed? But generally speaking, I would prefer to avoid doing things like this - I think that it's easier to understand an IDL type as either "just a view" over a BSONObj that doesn't own the memory, or as "participating in ownership" and keeping all of the underlying data alive. But maybe we do need a third option, which is "fully copy all of the data", although I'm not sure I understand the purpose or use-case of this third option at this time unless you have unique needs / require some new mutability control or something.
Anyway, given this context, I'm tempted to say this is "working as designed" as well. Feel free to let me know if I've misunderstood what you need or reach out on slack as needed! | |
| Comment by Aiden Szeto (Inactive) [ 19/Jul/23 ] | |
|
In this case, the IDL struct is used for both the command parameters and an oplog entry. The StringData is passed from the command parameter struct to the oplog entry struct which persists after the command is finished. The IDL changes are here and the garbage memory error can be seen here on line 1046: https://parsley.mongodb.com/resmoke/3329a41345d8a995d89207425ea5d255/test/1771cf5e56a412a851fad78164ac5c42?bookmarks=0,1046,1450&shareLine=0.
| |
| Comment by Billy Donahue [ 19/Jul/23 ] | |
|
you linked to asan finding a runtime problem. what was the coding error? Nothing in a BSONObj survives after the command is finished, so I'm not sure what the expectation is here. I'm tempted to say this is working as intended. There are performance reasons why BSONObj strings are exposed as StringData. |