[SERVER-7466] Replication always failed Created: 25/Oct/12  Updated: 11/Jul/16  Resolved: 29/Oct/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Thomas Rosenblatt Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: bug, replicaset
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.04 64 bits x86


Attachments: File tmp    
Issue Links:
Related
Operating System: Linux
Participants:

 Description   

Hello,

I'm facing a big issue in my production environment.
I'v moved my standalong existing database into a replica set, and my secondary instance always failed in its initial synchronization phase.

The failing happend on a collection named record, where indexes are defined below :
nws_rs_1:PRIMARY> db.record.getIndexSpecs()
[
{
"v" : 1,
"key" :

{ "_id" : 1 }

,
"ns" : "nadb.record",
"name" : "id"
},
{
"v" : 1,
"key" :

{ "_id.L" : 1, "_id.K" : 1 }

,
"unique" : true,
"ns" : "nadb.record",
"name" : "id.L_1_id.K_1",
"dropDups" : true
}
]

nws_rs_1:PRIMARY> db.record.stats()
{
"ns" : "nadb.record",
"count" : 1605239,
"size" : 4816750020,
"avgObjSize" : 3000.643530340342,
"storageSize" : 8696713216,
"numExtents" : 33,
"nindexes" : 2,
"lastExtentSize" : 1940869120,
"paddingFactor" : 1.0040000005010514,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 145295696,
"indexSizes" :

{ "_id_" : 88120928, "_id.L_1__id.K_1" : 57174768 }

,
"ok" : 1
}

And the secondary always fails with message:
Thu Oct 25 11:46:21 [repl writer worker 4] Fatal Assertion 15915
(full log trace in uploaded message)

This is a big issue since I can't set any replication mechanism !

Thank you for you help.

Thomas.



 Comments   
Comment by Kristina Chodorow (Inactive) [ 26/Oct/12 ]

Great, glad it worked out.

Yes, you should have a unique _id index on every collection if you're using replication, because replication needs a way to uniquely identify each document. See the warning here: http://www.mongodb.org/display/DOCS/Capped+Collections#CappedCollections-UsageandRestrictions.

Comment by Thomas Rosenblatt [ 26/Oct/12 ]

Hello Kristina,

Well I finally found a workaround that seems to work.
I stopped every processes that were written in the db during the startup of the secondary.

30 minutes after, when secondary was really started and replica was fully set, I restarted all processes, and replication was working just fine !
So I guess and I hope this issue is related to (SERVER-7186), and that this bug will be removed in the 2.2.1

I will not try 2.2.1 RC nor downgrading now that my production seems stable, I will first wait for stable 2.2.1 to try.

Just for info :
I had some huge collection that I created with _id that is not an index but only a unique composed index (_id.K : 1, _id.L : 1)
I didn't want to have an other index to decrease memory usage.
I first thought that replication was failing because of that, so I created an _id index for that collection but result was not better.

Are collection without _id index replicable ?
If not, it would be good to talk about it that in your documentation.

Thank you,

Thomas.

Comment by Kristina Chodorow (Inactive) [ 25/Oct/12 ]

There were some issues in 2.2.0 with replication (SERVER-7186) and we weren't handling initial sync quite right. I'm not positive that's what you're running into, but it's a strong possibility. Can you try the 2.2.1 RC at http://www.mongodb.org/downloads or downgrade?

Generated at Thu Feb 08 03:14:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.