Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Storage Execution
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Record stores used to store non-bson data back in the MMAPv1 days. Admittedly, it was internal to the MMAPv1 implementation, but it was still using the public APIs to store its btree buckets in an MmapV1RecordStore. That code is of course long gone, but the idea the record stores could store anything other than BSON remains by way of the RecordData type. I believe we now only store BSON in RecordStores, with the possible exception of some old unit tests which should be easy to convert to storing BSON.

I think this would be a nice cleanup of the storage engine concepts because it makes it clear that a RecordStore is logically a map<RecordId, BSON> and not some arbitrary blob of unknown format. Of course, this may not be ideal if we are planning to put non-bson data in a RecordStore (at the boundary of the storage engine API). But even then, it may still be a good idea to do this, to force us to use a separate type (possibly with a common base) for data structures that map from RecordIds to other formats.

While IMO it is worth doing this for the conceptual clarity alone, there are at least a few practical benefits I can see that this unlocks:

BSONObj is already set up to have owned views to slices of ConstSharedBuffers while RecordData isn't (not that it would be too hard to make it so). While currently we never take advantage of the owned data in the storage engine and always copy, we've identified a few places where that would be benefit from doing so.
RecordData probably should use ConstSharedBuffer rather than (mutable) SharedBuffer since the recievers of those should never modify them (at least without first checking isShared().)
Storage engines should be able to take advantage of the fact that they know that the values will be in BSON format. For example, they could either omit the size and trailing 0 byte, or (more likely) they could omit thier own storage of the size and just use the first 4 bytes of the BSON. Additionally, they could use BSON-specific compression of values without first validating that the data was actually bson.

Assignee:: Unassigned
Reporter:: Mathias Stearn
Participants:: Louis Williams, Mathias Stearn
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Nov 13 2023 12:32:31 PM UTC
Updated:: Dec 02 2024 07:10:55 PM UTC
Resolved:: Dec 02 2024 07:10:55 PM UTC

Details

Description

Attachments

Activity

People

Dates