<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:00:53 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-2681] replica member never syncs up and starts from scratch multiple times.</title>
                <link>https://jira.mongodb.org/browse/SERVER-2681</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;seems like the member is not able to initialize fully and tries to start from scratch during the initial sync process.&lt;br/&gt;
the oplog seems to be enough to do the process as it takes about 12h to copydata and create indexes on the new member and the oplog has 51h&lt;/p&gt;

&lt;p&gt;log0x:PRIMARY&amp;gt; db.printReplicationInfo()&lt;br/&gt;
configured oplog size:   79892.4875MB&lt;br/&gt;
log length start to end: 184394secs (51.22hrs)&lt;br/&gt;
oplog first event time:  Wed Mar 02 2011 06:38:52 GMT-0800 (PST)&lt;br/&gt;
oplog last event time:   Fri Mar 04 2011 09:52:06 GMT-0800 (PST)&lt;br/&gt;
now:                     Fri Mar 04 2011 09:52:06 GMT-0800 (PST)&lt;/p&gt;


&lt;p&gt;Attached is the log from the log12 the member trying to sync up.&lt;/p&gt;</description>
                <environment>ubuntu, &lt;br/&gt;
mongodb 1.7.5, &lt;br/&gt;
two replica members + 1 arbiter&lt;br/&gt;
java driver connecting to log11 and log12</environment>
        <key id="14997">SERVER-2681</key>
            <summary>replica member never syncs up and starts from scratch multiple times.</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="kristina">Kristina Chodorow</assignee>
                                    <reporter username="rgiudici">Reinaldo Giudici</reporter>
                        <labels>
                    </labels>
                <created>Fri, 4 Mar 2011 17:54:37 +0000</created>
                <updated>Tue, 12 Jul 2016 00:19:23 +0000</updated>
                            <resolved>Mon, 14 Mar 2011 22:04:12 +0000</resolved>
                                    <version>1.7.5</version>
                                    <fixVersion>1.9.0</fixVersion>
                                                        <votes>0</votes>
                                    <watches>4</watches>
                                                                                                                <comments>
                            <comment id="25877" author="kristina" created="Mon, 14 Mar 2011 18:34:51 +0000"  >&lt;p&gt;Yay!  Looks like it finally made it.  The fix I made should be able to make it a bit more robust (so it doesn&apos;t have to keep starting over) in the future.  Glad it finally worked out.&lt;/p&gt;</comment>
                            <comment id="25855" author="rgiudici" created="Mon, 14 Mar 2011 16:40:24 +0000"  >&lt;p&gt;I cleaned up the local database shutdown mongod on log11, and rm local.*&lt;br/&gt;
and started again from scratch the sync from mongo12, it failed at least 50 times, but today seems like log12 is a secondary !!!&lt;/p&gt;


&lt;p&gt;cat /var/log/mongodb/mongod.node1.log | grep 13127  | wc -l&lt;br/&gt;
49&lt;br/&gt;
(but the log only goes back to saturday)&lt;/p&gt;



&lt;p&gt;Attached is the log.&lt;br/&gt;
this section of the log never happened before (just partial log here) and it seens that there is some delay on the first part of the oplog sync, i.e. from the log it seems like it took 24 minutes to apply the first 100k ops, and 9 minutes to apply the last 1.9MM.&lt;/p&gt;

&lt;p&gt;In all previous attempts the initialSyncOplogApplication was never on the log at all.&lt;/p&gt;


&lt;p&gt;Mon Mar 14 04:57:46 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync initial oplog application&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;......&amp;#93;&lt;/span&gt;&lt;br/&gt;
Mon Mar 14 05:21:31 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 100000&lt;br/&gt;
Mon Mar 14 05:23:26 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 200000&lt;br/&gt;
Mon Mar 14 05:23:34 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 300000&lt;br/&gt;
Mon Mar 14 05:26:38 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 400000&lt;br/&gt;
Mon Mar 14 05:27:21 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 500000&lt;br/&gt;
Mon Mar 14 05:27:29 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 600000&lt;br/&gt;
Mon Mar 14 05:27:38 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 700000&lt;br/&gt;
Mon Mar 14 05:27:45 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 800000&lt;br/&gt;
Mon Mar 14 05:27:52 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 900000&lt;br/&gt;
Mon Mar 14 05:28:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 1000000&lt;br/&gt;
Mon Mar 14 05:28:07 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 1100000&lt;br/&gt;
Mon Mar 14 05:28:14 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 1200000&lt;br/&gt;
Mon Mar 14 05:28:21 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 1300000&lt;br/&gt;
Mon Mar 14 05:28:28 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 1400000&lt;br/&gt;
Mon Mar 14 05:28:36 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 1500000&lt;br/&gt;
Mon Mar 14 05:28:43 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 1600000&lt;br/&gt;
Mon Mar 14 05:28:50 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 1700000&lt;br/&gt;
Mon Mar 14 05:28:57 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 1800000&lt;br/&gt;
Mon Mar 14 05:29:04 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 1900000&lt;br/&gt;
Mon Mar 14 05:29:11 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initialSyncOplogApplication 2000000&lt;/p&gt;
</comment>
                            <comment id="25779" author="auto" created="Fri, 11 Mar 2011 18:49:05 +0000"  >&lt;p&gt;Author:&lt;/p&gt;
{u&apos;login&apos;: u&apos;kchodorow&apos;, u&apos;name&apos;: u&apos;Kristina&apos;, u&apos;email&apos;: u&apos;kristina@10gen.com&apos;}
&lt;p&gt;Message: handle cursor timeouts during initial sync &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-2681&quot; title=&quot;replica member never syncs up and starts from scratch multiple times.&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-2681&quot;&gt;&lt;del&gt;SERVER-2681&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/commit/7e9c5f3dcaff349489f3cd342afa42c2830113e9&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/mongodb/mongo/commit/7e9c5f3dcaff349489f3cd342afa42c2830113e9&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="25705" author="kristina" created="Thu, 10 Mar 2011 20:50:35 +0000"  >&lt;p&gt;That&apos;s an error code.  Initial sync should handle that, I&apos;ll work on fixing it so that it doesn&apos;t get stopped by that error.&lt;/p&gt;</comment>
                            <comment id="25704" author="rgiudici" created="Thu, 10 Mar 2011 19:51:22 +0000"  >&lt;p&gt;No killing of connections that I know of...&lt;br/&gt;
The strange thing is that is seems to move along until this line&lt;/p&gt;

&lt;p&gt;Wed Mar  9 19:45:37 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync initial oplog application&lt;/p&gt;

&lt;p&gt;And the timeout call seems to show the same error, is that an oplog position ? or a error code ? &lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;13:49:16 rgiudici@cloud-log12:~&amp;#93;&lt;/span&gt;$ cat mongod.log | grep oplog | grep failing &lt;br/&gt;
Wed Mar  9 20:33:33 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync failing, error applying oplog 13127 getMore: cursor didn&apos;t exist on server, possible restart or timeout?&lt;br/&gt;
Wed Mar  9 21:25:16 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync failing, error applying oplog 13127 getMore: cursor didn&apos;t exist on server, possible restart or timeout?&lt;br/&gt;
Wed Mar  9 22:46:44 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync failing, error applying oplog 13127 getMore: cursor didn&apos;t exist on server, possible restart or timeout?&lt;br/&gt;
Wed Mar  9 23:33:27 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync failing, error applying oplog 13127 getMore: cursor didn&apos;t exist on server, possible restart or timeout?&lt;br/&gt;
Thu Mar 10 02:12:27 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync failing, error applying oplog 13127 getMore: cursor didn&apos;t exist on server, possible restart or timeout?&lt;br/&gt;
Thu Mar 10 04:26:33 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync failing, error applying oplog 13127 getMore: cursor didn&apos;t exist on server, possible restart or timeout?&lt;br/&gt;
Thu Mar 10 07:00:30 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync failing, error applying oplog 13127 getMore: cursor didn&apos;t exist on server, possible restart or timeout?&lt;br/&gt;
Thu Mar 10 07:53:20 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync failing, error applying oplog 13127 getMore: cursor didn&apos;t exist on server, possible restart or timeout?&lt;br/&gt;
Thu Mar 10 08:34:50 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync failing, error applying oplog 13127 getMore: cursor didn&apos;t exist on server, possible restart or timeout?&lt;br/&gt;
Thu Mar 10 09:04:09 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync failing, error applying oplog 13127 getMore: cursor didn&apos;t exist on server, possible restart or timeout?&lt;br/&gt;
Thu Mar 10 11:28:44 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync failing, error applying oplog 13127 getMore: cursor didn&apos;t exist on server, possible restart or timeout?&lt;br/&gt;
Thu Mar 10 12:05:01 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync failing, error applying oplog 13127 getMore: cursor didn&apos;t exist on server, possible restart or timeout?&lt;/p&gt;</comment>
                            <comment id="25692" author="kristina" created="Thu, 10 Mar 2011 18:06:32 +0000"  >&lt;p&gt;Could you have some sort of router or switch that is killing connections after some length of time?&lt;/p&gt;</comment>
                            <comment id="25689" author="kristina" created="Thu, 10 Mar 2011 18:04:29 +0000"  >&lt;p&gt;The &quot;no user in local.system.users to use for authentication&quot; is harmless.&lt;/p&gt;</comment>
                            <comment id="25688" author="kristina" created="Thu, 10 Mar 2011 18:03:58 +0000"  >&lt;p&gt;The problem is that you keep having network blips:&lt;/p&gt;

&lt;p&gt;Sun Mar  6 05:23:04 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; MessagingPort recv() errno:104 Connection reset by peer 10.17.100.71:27017&lt;br/&gt;
Sun Mar  6 05:23:04 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; SocketException: remote: 10.17.100.71:27017 error: 9001 socket exception &lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt;&lt;br/&gt;
Sun Mar  6 05:23:04 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; DBClientCursor::init call() failed&lt;br/&gt;
Sun Mar  6 05:23:04 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync exception 10276 DBClientBase::findOne: transport error: cloud-log11:27017 query: { query: {}, orderby: &lt;/p&gt;
{ $natural: -1 }
&lt;p&gt; }&lt;br/&gt;
Sun Mar  6 05:23:34 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync pending&lt;br/&gt;
Sun Mar  6 05:23:34 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet syncing to: cloud-log11:27017&lt;br/&gt;
Sun Mar  6 05:23:34 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync drop all databases&lt;/p&gt;

&lt;p&gt;It can&apos;t reach the server it&apos;s trying to sync to.  &lt;/p&gt;

&lt;p&gt;As I said above, you could try doing a fastsync.  You could also see if you could do anything to fix your network or sync to a &quot;closer&quot; machine, if you&apos;re syncing over a WAN at the moment.  &lt;/p&gt;

&lt;p&gt;Initial sync will be getting more tolerant of network failures, but not until 1.8.1 (at least).&lt;/p&gt;</comment>
                            <comment id="25685" author="rgiudici" created="Thu, 10 Mar 2011 17:54:46 +0000"  >&lt;p&gt;Attached is a new log. this one is after repairing all databases in the primary (except local and admin) and after dropping some data from the primary.&lt;br/&gt;
Still fails to sync, seems like the last entry related to the sync is&lt;/p&gt;

&lt;p&gt;Thu Mar 10 11:42:37 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync initial oplog application&lt;/p&gt;


&lt;p&gt;I see a few of these, we do not have authentication enabled:&lt;br/&gt;
Thu Mar 10 11:42:37 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replauthenticate: no user in local.system.users to use for authentication&lt;/p&gt;

&lt;p&gt;The rest interface shows like is necer able to catchup on replication. I think the optime on the secondary, moves really slowly sometimes does not change in 20/30 seconds and then when it does change it changes by 1 for example now (1 minute later is at: 4d790a52:37e )&lt;/p&gt;

&lt;p&gt;Member	id	Up	cctime	Last heartbeat	Votes	Priority	State	Messages	optime	skew&lt;br/&gt;
cloud-log11:27017 (me)	0	1	1.6e+02 hrs		1	1	PRIMARY		4d790fd9:111	&lt;br/&gt;
cloud-mongo-arbiter01.vm.dfw:27017	1	1	1.1e+02 hrs	1 sec ago	1	1	ARBITER		0:0	1&lt;br/&gt;
cloud-log12:27017	2	1	16 hrs	0 secs ago	1	1	RECOVERING		4d790a52:37c	&lt;/p&gt;</comment>
                            <comment id="25646" author="kristina" created="Wed, 9 Mar 2011 21:53:16 +0000"  >&lt;p&gt;Apparently it&apos;s cosmetic: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-2669&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.mongodb.org/browse/SERVER-2669&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;From the logs, it looks like MongoDB lost the connection to the server it was syncing from... it isn&apos;t very tolerant of blippy networks at the moment.&lt;/p&gt;

&lt;p&gt;Thu Mar  3 18:12:36 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; MessagingPort say send() errno:32 Broken pipe 10.17.100.71:27017&lt;br/&gt;
Thu Mar  3 18:12:36 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync exception 9001 socket exception &lt;span class=&quot;error&quot;&gt;&amp;#91;2&amp;#93;&lt;/span&gt;&lt;br/&gt;
Thu Mar  3 18:13:06 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync pending&lt;/p&gt;

&lt;p&gt;If you have a backup of log11, you could try starting up using --fastsync with it, which should make it pull less over the network.&lt;/p&gt;</comment>
                            <comment id="25643" author="rgiudici" created="Wed, 9 Mar 2011 21:30:50 +0000"  >&lt;p&gt;another interesting find maybe, seems like the size for the oplog is negative ? or is that just an overflow &lt;/p&gt;


&lt;p&gt;log0x:PRIMARY&amp;gt; &amp;gt; db.oplog.rs.stats()&lt;br/&gt;
{&lt;br/&gt;
	&quot;ns&quot; : &quot;local.oplog.rs&quot;,&lt;br/&gt;
	&quot;count&quot; : 152530910,&lt;br/&gt;
	&quot;size&quot; : -339245068,&lt;br/&gt;
	&quot;avgObjSize&quot; : -171.17218302834488,&lt;br/&gt;
	&quot;storageSize&quot; : 83773345024,&lt;br/&gt;
	&quot;numExtents&quot; : 40,&lt;br/&gt;
	&quot;nindexes&quot; : 0,&lt;br/&gt;
	&quot;lastExtentSize&quot; : 62697328,&lt;br/&gt;
	&quot;paddingFactor&quot; : 1,&lt;br/&gt;
	&quot;flags&quot; : 0,&lt;br/&gt;
	&quot;totalIndexSize&quot; : 0,&lt;br/&gt;
	&quot;indexSizes&quot; : {&lt;/p&gt;

&lt;p&gt;	},&lt;br/&gt;
	&quot;capped&quot; : 1,&lt;br/&gt;
	&quot;max&quot; : 2147483647,&lt;br/&gt;
	&quot;ok&quot; : 1&lt;br/&gt;
}&lt;/p&gt;</comment>
                            <comment id="25458" author="rgiudici" created="Mon, 7 Mar 2011 21:46:49 +0000"  >&lt;p&gt;I have not run repairDatabase on all of them. I ran repair database on a few of them&lt;/p&gt;

&lt;p&gt;This is a &quot;log server&quot; we had setup a 50GB capped collection per log collection, and then realized that only production logs needed that much. &lt;br/&gt;
So We reduced to 2GB capped collections, I removed the collection, created and ran repair database on the non production logs. That recovered the log11 space and server seems pretty healthy. But log12 has not been able to catchup again ever.&lt;/p&gt;

&lt;p&gt;Will run repairDatabase() try to reindex the collections ? if so that will bring down the server as the indexing on the secondaries seem to be taking multiple hours which will be a problem in our setup.&lt;/p&gt;

</comment>
                            <comment id="25410" author="kristina" created="Mon, 7 Mar 2011 16:43:50 +0000"  >&lt;p&gt;Running out of diskspace on the master could definitely cause this on the slave.  Have you run repair on every database that was handling writes when you ran out of disk space?&lt;/p&gt;</comment>
                            <comment id="25288" author="rgiudici" created="Fri, 4 Mar 2011 22:20:33 +0000"  >&lt;p&gt;The master log11 was having issue with diskspace until yesterday that we change the size of some capped collections and run repairDatabase on a few dtabases.&lt;br/&gt;
the master is doing fine now, and the sync after that started much faster and better than before, but still got into that strange repeat cycle.&lt;/p&gt;</comment>
                            <comment id="25274" author="kristina" created="Fri, 4 Mar 2011 20:57:19 +0000"  >&lt;p&gt;Is it trying to sync from the master that you&apos;re having corruption issues with?&lt;/p&gt;</comment>
                            <comment id="25272" author="rgiudici" created="Fri, 4 Mar 2011 20:51:12 +0000"  >&lt;p&gt;This log is actually the result of that.&lt;/p&gt;

&lt;p&gt;We did tried multiple things, but yesterday we removed all, and started to sync from scratch.&lt;/p&gt;

&lt;p&gt;I think this is the last restart info. We did remove the data.&lt;br/&gt;
You can see the indexing takes forever and I suspect this might be related to that.&lt;br/&gt;
We have tried:&lt;/p&gt;

&lt;p&gt;. start from scratch (no data on the dir)&lt;br/&gt;
. recover a backup and start&lt;br/&gt;
. recover a backup and start with fast sync&lt;br/&gt;
. recover with mongorestore and start&lt;/p&gt;

&lt;p&gt;Also we upped the memory to 96GB and still no joy&lt;/p&gt;

&lt;p&gt;none of them have been able to finish at all. &lt;/p&gt;

&lt;p&gt;Each test takes more than 12h&lt;/p&gt;


&lt;p&gt;Thu Mar  3 05:08:59 &lt;span class=&quot;error&quot;&gt;&amp;#91;initandlisten&amp;#93;&lt;/span&gt; MongoDB starting : pid=5555 port=27017 dbpath=/var/lib/mongodb/node1 64-bit&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;
	&lt;ul&gt;
		&lt;li&gt;NOTE: This is a development version (1.7.5) of MongoDB.&lt;/li&gt;
		&lt;li&gt;Not recommended for production.&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Thu Mar  3 05:08:59 &lt;span class=&quot;error&quot;&gt;&amp;#91;initandlisten&amp;#93;&lt;/span&gt; db version v1.7.5, pdfile version 4.5&lt;br/&gt;
Thu Mar  3 05:08:59 &lt;span class=&quot;error&quot;&gt;&amp;#91;initandlisten&amp;#93;&lt;/span&gt; git version: 1978898d7a013657e5400133defdc996fb4c2c15&lt;br/&gt;
Thu Mar  3 05:08:59 &lt;span class=&quot;error&quot;&gt;&amp;#91;initandlisten&amp;#93;&lt;/span&gt; sys info: Linux domU-12-31-39-06-79-A1 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41&lt;br/&gt;
Thu Mar  3 05:08:59 &lt;span class=&quot;error&quot;&gt;&amp;#91;initandlisten&amp;#93;&lt;/span&gt; waiting for connections on port 27017&lt;br/&gt;
Thu Mar  3 05:08:59 &lt;span class=&quot;error&quot;&gt;&amp;#91;websvr&amp;#93;&lt;/span&gt; web admin interface listening on port 28017&lt;br/&gt;
Thu Mar  3 05:08:59 &lt;span class=&quot;error&quot;&gt;&amp;#91;initandlisten&amp;#93;&lt;/span&gt; connection accepted from 127.0.0.1:19147 #1&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;initandlisten&amp;#93;&lt;/span&gt; connection accepted from 10.36.115.6:12188 #2&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;initandlisten&amp;#93;&lt;/span&gt; connection accepted from 10.36.115.6:12189 #3&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;initandlisten&amp;#93;&lt;/span&gt; connection accepted from 10.17.100.71:22092 #4&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;startReplSets&amp;#93;&lt;/span&gt; trying to contact cloud-log11:27017&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;initandlisten&amp;#93;&lt;/span&gt; connection accepted from 10.36.58.20:58175 #5&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;startReplSets&amp;#93;&lt;/span&gt; replSet STARTUP2&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;rs Manager&amp;#93;&lt;/span&gt; replSet can&apos;t see a majority, will not try to elect self&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync pending&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet couldn&apos;t find a member matching the sync criteria:&lt;br/&gt;
state? none&lt;br/&gt;
name? none&lt;br/&gt;
_id? -1&lt;br/&gt;
optime? Dec 31 18:00:00:0&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;replica set sync&amp;#93;&lt;/span&gt; replSet initial sync need a member to be primary or secondary to do our initial sync&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;ReplSetHealthPollTask&amp;#93;&lt;/span&gt; replSet info cloud-mongo-arbiter01.vm.dfw:27017 is up&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;ReplSetHealthPollTask&amp;#93;&lt;/span&gt; replSet member cloud-mongo-arbiter01.vm.dfw:27017 ARBITER&lt;br/&gt;
Thu Mar  3 05:09:00 &lt;span class=&quot;error&quot;&gt;&amp;#91;rs Manager&amp;#93;&lt;/span&gt; replSet info not trying to elect self, do not yet have a complete set of data from any point in time&lt;/p&gt;</comment>
                            <comment id="25262" author="kristina" created="Fri, 4 Mar 2011 19:01:03 +0000"  >&lt;p&gt;It looks like you guys kill mongod multiple times, sometimes in the middle of the initial sync.  I think this may have confused it.  Can you stop mongod, remove everything from the data directory, and start the initial sync over again?&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="11096" name="mongod.finallysynched.gz" size="542790" author="rgiudici" created="Mon, 14 Mar 2011 16:40:24 +0000"/>
                            <attachment id="10988" name="mongod.log.log12.failedtosync.gz" size="138445" author="rgiudici" created="Fri, 4 Mar 2011 17:54:37 +0000"/>
                            <attachment id="11054" name="mongod.node1.log.afterRepairDatabase.gz" size="269624" author="rgiudici" created="Thu, 10 Mar 2011 17:54:46 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>17.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 4 Mar 2011 19:01:03 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        12 years, 49 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            12 years, 49 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>auto</customfieldvalue>
            <customfieldvalue>kristina</customfieldvalue>
            <customfieldvalue>rgiudici</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrp4m7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hridsv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>20888</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|ht0fpr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>