Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Storage
Labels:
None

Assigned Teams:

Storage Execution
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We've seen lots of people making abbreviated attribute names like "rcv_uid" for "receiving_user_id" or "pw" for "password" in a document - to improve space efficiency where they could repeatedly appear in a large collection. This is kind of ironic because, one of the greatest aspects of document-oriented database is to have flexible and "intuitive" document structure.

While we all like the idea of schema-free design, in reality, we actually NEED to have schema for better performance. Documents should be structured in a certain way, and indexed attributes are critical.

Here's a big question: What if we had a global symbol table for any attribute names in the database?

Possible values for attributes are unlimited, but possible "keys" are practically limited.

If we map the keys using 32bit symbol table as follows:

0x0001 => receiving_user_id (17 bytes -> 4 bytes)
0x0002 => password (8 bytes -> 4 bytes)

and the persisted presentation of document could take up less space. The median of key length in my past projects is like 10-12 bytes (e.g. "achievement_id", "leaderboard_id", "max_version_id", "icon_content_type"), so it's a big deal. In some pathological cases where 1-3 byte keys are used it means slight increase in size of course, but practically it will be almost always a win.

But the best part of this feature is change in mentality- we could stop worrying about keys taking too much space and start to use clear, descriptive key names for the document schema. As Phil Karlton said, "There are only two hard things in Computer Science: cache invalidation and naming things." Let's keep naming things non-restrictive.

duplicates

SERVER-863 Tokenize the field names

Closed

Assignee:: [DO NOT USE] Backlog - Storage Execution Team
Reporter:: Kenn Ejima
Participants:: [DO NOT USE] Backlog - Storage Execution Team, Ben McCann, Glenn Maynard, James Gray, Kenn Ejima
Votes:: 8 Vote for this issue
Watchers:: 14 Start watching this issue

Created:: Jun 17 2011 09:07:02 PM UTC
Updated:: Dec 06 2022 05:42:59 AM UTC
Resolved:: Dec 18 2018 08:07:36 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates