[SERVER-9330] Error 10092 during initial sync Created: 11/Apr/13  Updated: 10/Dec/14  Resolved: 05/Mar/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Noah Davis Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: crash, replica, replicaset
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

GNU/Linux


Attachments: Text File mongod.log     Text File rs_conf.txt     Text File rs_status.txt    
Operating System: ALL
Participants:

 Description   

During our initial sync, our mongodb instance acting as a replica crashed. I've attached the result of running "rs.conf()" and "rs.status()" on the master as well as the full mongod.log from the instance that crashed.



 Comments   
Comment by Tyler Brock [ 24/Apr/13 ]

Bryan that's perfectly fine to do as long as you are sure that since the killing of compact you don't have any duplicate id's you've introduced in some way (as the dropDups: true will drop them). I would also add background: true so that it doesn't block other database activities.

If you can, it would be advisable to run compact to completion at some point given this other note in the documentation:

"Much of the existing free space in the collection may become un-reusable. In this scenario, you should rerun the compaction to completion to restore the use of this free space."

Comment by Bryan Helmkamp [ 18/Apr/13 ]

I think I remember now what happened to this collection, as the _id index is not supposed to be removable (according to http://docs.mongodb.org/manual/core/indexes/).

IIRC I ran a compact (http://docs.mongodb.org/manual/reference/command/compact/) operation against this collection at one point but aborted using db.killOp(). The compact docs note:

"If you terminate the operation with the db.killOp() [...] You may have to manually rebuild the indexes."

Should we just run ensureIndex({_id: 1},

{unique: true, dropDups: true}

)? Or is that not a good idea because the _id index is special?

Comment by Bryan Helmkamp [ 16/Apr/13 ]

You'll notice the source_files collection is missing an index on _id. This is not intentional – I'm not sure how it got in that state. I vaguely remember now that at one point I may have accidentally deleted it. (Fortunately, we don't really look source_files up by _id). The _ids are ObjectIds.

Do we know if the dupes issue is with _id or the other index (which is a unique index)? Is there a way we can fix the source_files collection on the primary in order to make it sync-able?

Comment by Bryan Helmkamp [ 16/Apr/13 ]

Thanks, Stephen.

  • We did upgrade the primary's version of MongoDB a couple times, most recently from 2.2.2 to 2.4.1.
  • Output you requested:

db.source_files.stats();
{
	"ns" : "code_climate_production.source_files",
	"count" : 92360664,
	"size" : 34716184984,
	"avgObjSize" : 375.87630361773927,
	"storageSize" : 40873971520,
	"numExtents" : 50,
	"nindexes" : 1,
	"lastExtentSize" : 1864196096,
	"paddingFactor" : 1.0009999999956798,
	"systemFlags" : 0,
	"userFlags" : 0,
	"totalIndexSize" : 15921304672,
	"indexSizes" : {
		"repo_id_1_commit_sha_1_worker_version_1_path_1" : 15921304672
	},
	"ok" : 1
}
 
db.source_files.getIndexes();
[
	{
		"v" : 1,
		"key" : {
			"repo_id" : 1,
			"commit_sha" : 1,
			"worker_version" : 1,
			"path" : 1
		},
		"unique" : true,
		"ns" : "code_climate_production.source_files",
		"name" : "repo_id_1_commit_sha_1_worker_version_1_path_1",
		"dropDups" : true
	}
]

  • The disk space issue was a red herring from when we had incorrectly configured MongoDB to write data to the root partition rather than a data partition. We've fixed the configuration since then. This is the partition we are using:

/dev/mapper/vg0-mongo
                      493G   82G  386G  18% /srv/data/mongo

Comment by Stennie Steneker (Inactive) [ 16/Apr/13 ]

Hi Noah,

Based on the mongod.log, it appears that the last action underway was an index build:

Thu Apr 11 03:38:03.677 [rsSync] build index code_climate_production.source_files { _id: 1 }

... which encountered an exception:

Thu Apr 11 03:43:12.000 [rsSync] 		Index: (2/3) BTree Bottom Up Progress: 756900/88532353	0%
Thu Apr 11 03:43:18.513 [rsSync] replSet initial sync exception: 10092 too may dups on index build with dropDups=true 0 attempts remaining

.. followed by a fatal assertion trying to restart initialsync.

I noticed there were several attempts to sync in the logs (including one where mongod ran out of available disk space). Did you remove the files in the data directory after the failed initialsync attempts (i.e. before attempting the upgrade again)?

Can you provide some more background on this replica set:

  • Did you upgrade the primary from an earlier version of MongoDB (if so, which version)?
  • Can you provide the output of db.source_files.stats() and db.source_files.getIndexes() from your current primary?
  • Can you confirm the free space (df -h) for the volume with your data directory on the secondary?

Thanks,
Stephen

Generated at Thu Feb 08 03:20:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.