<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:30:17 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-12984] Real document size in collection</title>
                <link>https://jira.mongodb.org/browse/SERVER-12984</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Hello, I am not able to find out the real size of document stored in mongo collections. I googled, that document size may be obtained by two methods:&lt;br/&gt;
Object.bsonsize - some javascript method that should return a size in bytes&lt;br/&gt;
db.collection.stats() - where there is a line &apos;avgObjSize&apos; that produce some &quot;aggregated&quot;(average) size view on the data. It simply represents average size of single document.&lt;/p&gt;

&lt;p&gt;I have a really simple document with this structure:&lt;br/&gt;
{ &lt;br/&gt;
  test: &quot;test&quot;,&lt;br/&gt;
  ids: &lt;span class=&quot;error&quot;&gt;&amp;#91;id1, id2, id3..... id500000&amp;#93;&lt;/span&gt;&lt;br/&gt;
}&lt;br/&gt;
each id is constructed by 10 characters. &lt;/p&gt;

&lt;p&gt;By simle computation, the overall size should be more than 5 MB. &lt;br/&gt;
But when I initiate the &apos;Object.bsonsize&apos; command, it returns: 499. &lt;br/&gt;
Stats command returns &apos;size&apos; = 10747888.&lt;/p&gt;

&lt;p&gt;I am really confused. Is there any way how to reliable find out size of the particular document? Are my steps performed so far absolutely wrong?&lt;/p&gt;

&lt;p&gt;Thank you for your support, any help will be appreciated.&lt;/p&gt;</description>
                <environment></environment>
        <key id="116255">SERVER-12984</key>
            <summary>Real document size in collection</summary>
                <type id="6" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14720&amp;avatarType=issuetype">Question</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="stephen.steneker@mongodb.com">Stennie Steneker</assignee>
                                    <reporter username="botajzorek">Jan Botorek</reporter>
                        <labels>
                    </labels>
                <created>Sat, 1 Mar 2014 18:43:09 +0000</created>
                <updated>Sat, 18 Sep 2021 09:01:11 +0000</updated>
                            <resolved>Tue, 18 Mar 2014 10:25:31 +0000</resolved>
                                                                    <component>Diagnostics</component>
                    <component>Shell</component>
                    <component>Tools</component>
                                        <votes>0</votes>
                                    <watches>6</watches>
                                                                                                                <comments>
                            <comment id="4069957" author="stennie" created="Sat, 18 Sep 2021 09:01:11 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=saaitha%40cisco.com&quot; class=&quot;user-hover&quot; rel=&quot;saaitha@cisco.com&quot;&gt;saaitha@cisco.com&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Per earlier comments on this issue you can use something like &lt;tt&gt;Object.bsonsize()&lt;/tt&gt;&#160;in the MongoDB shell or the equivalent in your MongoDB driver.&lt;/p&gt;

&lt;p&gt;However, if your documents are likely to approach the maximum document size limit you may be using a schema design anti-pattern like &lt;a href=&quot;https://www.mongodb.com/developer/article/schema-design-anti-pattern-massive-arrays/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;Massive Arrays&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I highly recommend reviewing the articles in these two schema design series:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;&lt;a href=&quot;https://www.mongodb.com/blog/post/building-with-patterns-a-summary&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;Building with Patterns: A Summary&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://www.mongodb.com/developer/article/schema-design-anti-pattern-summary/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;A Summary of Schema Design Anti-Patterns and How to Spot Them&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;There is also a free online course at MongoDB University: &lt;a href=&quot;https://university.mongodb.com/courses/M320/about&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;M320: Data Modeling&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Lastly: please note that the SERVER project is for reporting bugs and potential improvements for the MongoDB server.&lt;/p&gt;

&lt;p&gt;For general discussion please start a new topic in the &lt;a href=&quot;https://community.mongodb.com/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;MongoDB Developer Community Forums&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Regards,&lt;br/&gt;
Stennie&lt;/p&gt;</comment>
                            <comment id="4059105" author="JIRAUSER1260950" created="Tue, 14 Sep 2021 16:32:36 +0000"  >&lt;p&gt;Hi Stennie,&lt;/p&gt;

&lt;p&gt;what is the best way to find the document size even before inserting into mongodb to prevent &quot;&lt;/p&gt;

&lt;p&gt;document too large&quot;&#160;error?&#160;&lt;/p&gt;</comment>
                            <comment id="508208" author="botajzorek" created="Mon, 3 Mar 2014 18:44:48 +0000"  >&lt;p&gt;Hello, thank you a lot for your help! Yes, now the result seems good:&lt;/p&gt;

&lt;p&gt;&amp;gt;Object.bsonsize(db.test.findOne( &lt;/p&gt;
{test:&quot;test&quot;}
&lt;p&gt;))&lt;br/&gt;
&amp;gt;10492282&lt;/p&gt;

&lt;p&gt;It did not occur to me that I queried cursor instead of the real document object. &lt;br/&gt;
Best regards&lt;/p&gt;</comment>
                            <comment id="507963" author="stennie" created="Mon, 3 Mar 2014 14:56:13 +0000"  >&lt;p&gt;Hi Jan,&lt;/p&gt;

&lt;p&gt;In your bsonsize() example you are measuring the size of a find(), which returns a cursor rather than a document.&lt;/p&gt;

&lt;p&gt;To find the size of a single document you should instead be using a findOne():&lt;/p&gt;
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt; Object.bsonsize(db.test.findOne( {test:&quot;test&quot;}))&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;Can you confirm this shows the expected document size?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Stephen&lt;/p&gt;</comment>
                            <comment id="507899" author="botajzorek" created="Mon, 3 Mar 2014 09:07:38 +0000"  >&lt;p&gt;Hello,&lt;br/&gt;
I will try to demonstrate my steps performed:&lt;/p&gt;

&lt;p&gt;1) Create document:&lt;/p&gt;
  { 
    test:&quot;test&quot;,
    ids:[
    &quot;1111111111&quot;,
    &quot;2222222222&quot;,
    &quot;3333333333&quot;,
    ... 
    ]
  }
&lt;p&gt;  There is 990000 randomly generated strings representing virtual ID. Every id is composed from 10 characters.&lt;/p&gt;

&lt;p&gt;2) Insert the document into the database:&lt;br/&gt;
db.test.insert(document created in step 1) )&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;collection IS EMPTY before inserting this document&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;3.a) db.test.stats():&lt;br/&gt;
{&lt;br/&gt;
        &quot;ns&quot; : &quot;home.test&quot;,&lt;br/&gt;
        &quot;count&quot; : 1,&lt;br/&gt;
        &quot;size&quot; : 10747888,&lt;br/&gt;
        &quot;avgObjSize&quot; : 10747888,&lt;br/&gt;
        &quot;storageSize&quot; : 167882752,&lt;br/&gt;
        &quot;numExtents&quot; : 2,&lt;br/&gt;
        &quot;nindexes&quot; : 1,&lt;br/&gt;
        &quot;lastExtentSize&quot; : 167878656,&lt;br/&gt;
        &quot;paddingFactor&quot; : 1,&lt;br/&gt;
        &quot;systemFlags&quot; : 1,&lt;br/&gt;
        &quot;userFlags&quot; : 0,&lt;br/&gt;
        &quot;totalIndexSize&quot; : 8176,&lt;br/&gt;
        &quot;indexSizes&quot; : &lt;/p&gt;
{
                &quot;_id_&quot; : 8176
        }
&lt;p&gt;,&lt;br/&gt;
        &quot;ok&quot; : 1&lt;br/&gt;
}&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;according to the &quot;size&quot; value - the single document should be of a size 10,7 MB. I believe, it can be the real size of the document. But unfortunately, when there are thousands of documents in the collection, this method (to find out the size of particular document) is not usable at all.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;3.b) Object.bsonsize(db.test.find(&lt;/p&gt;
{test:&quot;test&quot;}
&lt;p&gt;))&lt;br/&gt;
  returns : 460 &lt;br/&gt;
   this command I googled around - according to the comments and documentation I was able to find it should return specific size of the single document in bytes... As you can easily see, it is definitely not satisfied. There is a huge difference among these two values.&lt;/p&gt;


&lt;p&gt;Is there any other way how to find the real size of stored documents? &lt;/p&gt;

&lt;p&gt;Thank you for your support&lt;/p&gt;</comment>
                            <comment id="507857" author="dan@10gen.com" created="Mon, 3 Mar 2014 04:57:20 +0000"  >&lt;p&gt;can you describe the exact commands you are running and run the collection stats command?&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 3 Mar 2014 04:57:20 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 20 weeks, 4 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>stephen.steneker@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 20 weeks, 4 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>dan@mongodb.com</customfieldvalue>
            <customfieldvalue>botajzorek</customfieldvalue>
            <customfieldvalue>saaitha@cisco.com</customfieldvalue>
            <customfieldvalue>stephen.steneker@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrl1m7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrwlkv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>104231</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hs9wa7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>