<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:02:38 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-3288] Symbol table for attribute names</title>
                <link>https://jira.mongodb.org/browse/SERVER-3288</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;We&apos;ve seen lots of people making abbreviated attribute names like &quot;rcv_uid&quot; for &quot;receiving_user_id&quot; or &quot;pw&quot; for &quot;password&quot; in a document - to improve space efficiency where they could repeatedly appear in a large collection. This is kind of ironic because, one of the greatest aspects of document-oriented database is to have flexible and &quot;intuitive&quot; document structure.&lt;/p&gt;

&lt;p&gt;While we all like the idea of schema-free design, in reality, we actually NEED to have schema for better performance. Documents should be structured in a certain way, and indexed attributes are critical.&lt;/p&gt;

&lt;p&gt;Here&apos;s a big question: What if we had a global symbol table for any attribute names in the database?&lt;/p&gt;

&lt;p&gt;Possible values for attributes are unlimited, but possible &quot;keys&quot; are practically limited.&lt;/p&gt;

&lt;p&gt;If we map the keys using 32bit symbol table as follows:&lt;/p&gt;

&lt;p&gt;0x0001 =&amp;gt; receiving_user_id (17 bytes -&amp;gt; 4 bytes)&lt;br/&gt;
0x0002 =&amp;gt; password (8 bytes -&amp;gt; 4 bytes)&lt;/p&gt;

&lt;p&gt;and the persisted presentation of document could take up less space. The median of key length in my past projects is like 10-12 bytes (e.g. &quot;achievement_id&quot;, &quot;leaderboard_id&quot;, &quot;max_version_id&quot;, &quot;icon_content_type&quot;), so it&apos;s a big deal. In some pathological cases where 1-3 byte keys are used it means slight increase in size of course, but practically it will be almost always a win.&lt;/p&gt;

&lt;p&gt;But the best part of this feature is change in mentality- we could stop worrying about keys taking too much space and start to use clear, descriptive key names for the document schema. As Phil Karlton said, &quot;There are only two hard things in Computer Science: cache invalidation and naming things.&quot; Let&apos;s keep naming things non-restrictive.&lt;/p&gt;</description>
                <environment></environment>
        <key id="18422">SERVER-3288</key>
            <summary>Symbol table for attribute names</summary>
                <type id="2" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14711&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="backlog-server-execution">Backlog - Storage Execution Team</assignee>
                                    <reporter username="kenn">Kenn Ejima</reporter>
                        <labels>
                    </labels>
                <created>Fri, 17 Jun 2011 21:07:02 +0000</created>
                <updated>Tue, 6 Dec 2022 05:42:59 +0000</updated>
                            <resolved>Tue, 18 Dec 2018 20:07:36 +0000</resolved>
                                                                    <component>Storage</component>
                                        <votes>8</votes>
                                    <watches>14</watches>
                                                                                                                <comments>
                            <comment id="112226" author="james4k" created="Sat, 21 Apr 2012 04:26:24 +0000"  >&lt;p&gt;It is the same basic issue, but the proposition given is a little absurd. No matter the implementation, it should be completely transparent to the user AND the driver.&lt;/p&gt;</comment>
                            <comment id="112051" author="chengas123" created="Fri, 20 Apr 2012 18:04:58 +0000"  >&lt;p&gt;Totally agree this is needed.  Or some type of compression that would notice the same keys being used and be able to efficiently compress the docs.  Anyway, seems to be a duplicate of this other issue that you should probably follow and vote for instead: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-863&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.mongodb.org/browse/SERVER-863&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="109441" author="glenn" created="Fri, 13 Apr 2012 04:55:36 +0000"  >&lt;p&gt;Since the large majority of documents have fewer than 128 unique keys, a variable-length encoding could be used to store the symbol offset for strings.  The encoding algorithm from UTF-8 is a good one.  That way, the key names in most documents, which have a reasonably small number (up to 128) of static keys, would be stored as a single byte.  Up to 2048 would be encoded in two bytes.  (This would require that each collection have its own table, though.)&lt;/p&gt;

&lt;p&gt;The byte 0xFF, which is never the first byte in a UTF-8 sequence, could be used to mean &quot;not in the symbol table&quot;, with the symbol immediately following it.  This would add one byte of overhead per document key in the worst case.&lt;/p&gt;</comment>
                            <comment id="109188" author="james4k" created="Thu, 12 Apr 2012 17:27:25 +0000"  >&lt;p&gt;This would be such a great improvement. At the moment I am having to move most of my data to another database. Frankly, it is because I did make the assumption that keys were pooled in some way. Now that I know that is not the case, it&apos;s enough for me to switch back to a more traditional DB. When you are dealing with millions of documents, it is a bit of a deal breaker. I hope MongoDB makes space-efficiency a higher priority in the near feature, so that I can use it for more of my projects. It really is a joy to use.&lt;/p&gt;</comment>
                            <comment id="106648" author="glenn" created="Wed, 4 Apr 2012 19:15:45 +0000"  >&lt;p&gt;It&apos;s bad to get people in the habit of obfuscating their keys--storing data efficiently is the database&apos;s job.  It&apos;s pretty surprising that string pooling wasn&apos;t done from day one.&lt;/p&gt;

&lt;p&gt;Don&apos;t assume that keys are finite, though; you can definitely have dynamic strings (eg. UUIDs) as keys.  The string table needs to fit in memory, so it needs some limitations on what&apos;s stored there (eg. maximum string length).&lt;/p&gt;

&lt;p&gt;Also, please note &lt;a href=&quot;https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/EO6RDk-ATfU&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/EO6RDk-ATfU&lt;/a&gt;.  I&apos;m manually padding objects within arrays to a fixed BSON length, to ensure that later updates can happen in-place.  If this feature is added, the stored BSON size will no longer be the same as the size we see on the client-end.  A prerequisite for this feature should be an array-size hint in BSON, which would allow clients to request that array items, when stored on disk, be padded to a specific size.  That avoids this problem.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="11679">SERVER-863</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25136"><![CDATA[Storage Execution]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 4 Apr 2012 19:15:45 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        11 years, 43 weeks, 4 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            11 years, 43 weeks, 4 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-execution</customfieldvalue>
            <customfieldvalue>chengas123</customfieldvalue>
            <customfieldvalue>glenn</customfieldvalue>
            <customfieldvalue>james4k</customfieldvalue>
            <customfieldvalue>kenn</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hroxi7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr8fm7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6118</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hsous7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>