<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:36:50 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-15069] Better validation of stored geometries without failing index builds</title>
                <link>https://jira.mongodb.org/browse/SERVER-15069</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Currently when building a Geo index on a collection with existing data the build will fail if documents being indexed contain invalid geometries.&lt;/p&gt;

&lt;p&gt;This means that the only option to building the index is to to brute force the index build and resolving the errors that are picked up after each failed build. This can be a costly and time consuming operation especially when a large amount of data is involved.&lt;/p&gt;

&lt;p&gt;We should provide some mechanism to allow users to validate geometries without waiting for index builds to fail.&lt;/p&gt;
</description>
                <environment></environment>
        <key id="155234">SERVER-15069</key>
            <summary>Better validation of stored geometries without failing index builds</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-query-integration">Backlog - Query Integration</assignee>
                                    <reporter username="david.hows">David Hows</reporter>
                        <labels>
                            <label>qi-geo</label>
                    </labels>
                <created>Thu, 28 Aug 2014 04:05:26 +0000</created>
                <updated>Thu, 28 Dec 2023 18:36:31 +0000</updated>
                                                                            <component>Geo</component>
                    <component>Index Maintenance</component>
                                        <votes>0</votes>
                                    <watches>6</watches>
                                                                                                                <comments>
                            <comment id="727021" author="robert.jobson@dominionenterprises.com" created="Thu, 25 Sep 2014 15:12:44 +0000"  >&lt;p&gt;Normal `work around` we use which was included in the original commercial support ticket that spawned this feature request is to create a second collection with index and copy records over. Marking the ones that succeed with a flag then once you have run through them all moving the bad ones out. Something like &lt;/p&gt;

&lt;p&gt;for (i = 0; i &amp;lt; 139; i++) &lt;/p&gt;
{ 
db.collection_clean.ensureIndex(
{&quot;geoJson&quot;:&quot;2dsphere&quot;}
&lt;p&gt;);&lt;br/&gt;
db.collection.find({c:{$exists:0}},&lt;/p&gt;
{&quot;geoJson&quot;:1}
&lt;p&gt;).addOption(16).limit(500000).forEach(function(doc)&lt;/p&gt;
{ db.collection_clean.insert(doc); }
&lt;p&gt;);&lt;br/&gt;
db.collection_clean.find({},{}).addOption(16).forEach(function(doc){&lt;br/&gt;
db.collection.update({_id:doc._id},{$set:{c:1}});&lt;br/&gt;
});&lt;br/&gt;
db.collection_clean.drop();&lt;br/&gt;
}&lt;br/&gt;
db.collection.find({c:{$exists:0}}).forEach(function(doc){&lt;br/&gt;
db.collection.update({_id:doc._id},{$set:&lt;/p&gt;
{malgeo:doc.geoJson}
&lt;p&gt;,$unset:{geoJson:1}});&lt;br/&gt;
});&lt;/p&gt;

&lt;p&gt;The request there was actually that a parameter be added to the the index build to provide a field name for where to move the bad geometries. So rather than dying on finding a malformed geometry the build would move it aside and continue. This greatly simplifies the amount of work required.&lt;/p&gt;

&lt;p&gt;Bear in mind that we are dealing with data sets of a cumbersome size. The set that prompted this request took weeks to load and more weeks to clean.&lt;/p&gt;</comment>
                            <comment id="706698" author="greg_10gen" created="Thu, 28 Aug 2014 14:09:43 +0000"  >&lt;p&gt;Another option is to run one or more $geoWithin/$geoIntersects queries on the data in the unindexed collection - documents which are not valid GeoJSON will not be returned and can then be fixed.&lt;/p&gt;</comment>
                            <comment id="706595" author="milkie" created="Thu, 28 Aug 2014 12:05:31 +0000"  >&lt;p&gt;Can you create the new geo index on an empty collection and then write a script that attempts to copy each document?  In this way, you could build up a list of which documents fail the index validation.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25467"><![CDATA[Query Integration]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 28 Aug 2014 12:05:31 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        9 years, 20 weeks, 6 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ted.tuckman@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            9 years, 20 weeks, 6 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-query-integration</customfieldvalue>
            <customfieldvalue>david.hows</customfieldvalue>
            <customfieldvalue>milkie@mongodb.com</customfieldvalue>
            <customfieldvalue>greg_10gen</customfieldvalue>
            <customfieldvalue>robert.jobson@dominionenterprises.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrloz3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr2dnr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>134895</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hsgowf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>