[SERVER-4294] ReplicaSet Intial Sync gives replSet syncTail: 11000 E11000 duplicate key error index Created: 16/Nov/11  Updated: 11/Jul/16  Resolved: 13/Feb/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.1
Fix Version/s: 2.1.1

Type: Bug Priority: Critical - P2
Reporter: Steffen Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 1
Labels: duplicates
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux 2.6.32-35-server, Ubuntu 10.04, MongoDB 2.0.1, Replicaset with 3 Nodes, NUMA, 2x XEON E5620 , 24 GB RAM


Attachments: Zip Archive primary.log.zip     Zip Archive secondary.log.zip    
Operating System: Linux
Participants:

 Description   

What we want to do:
We have a 3 Node replicaset and we want to add another node which should be dedicated for backups. This backup machine is not as powerful as the productive nodes.

What error do we get:
After we added the clean backup node to the set, it does the initial sync and after some time we get an duplicate error:
Wed Nov 16 11:02:49 [rsSync] replSet syncTail: 11000 E11000 duplicate key error index: keywords.keywordAccountMappings.$accountId_1 dup key: { : 1323880 }, syncing: { ts: Timestamp 1321392524000|36, h: -5191181626474352236, op: "i", ns: "keywords.keywordAccountMappings", o: { _id: ObjectId('4ec2d98c1820b45cca0fe5dc'), accountId: 1323880, keywords: [

{ keyword: ObjectId('4e70b0e118a39bb918001631'), createdAt: new Date(1321392524000) }

] } }

Differences between old nodes and the new node:
the new node has a slower IO system, just 12 GB ram, a slight slower cpu and there is a 2nd mongod shard instance running, which backups another replicaset.
On the old nodes we have the mongodb index version 0 (1.8) and the new backup node has the latest index version 1 (2.0).

What we suspect that the use of $addToSet in combination with upsert might cause the replication errors:
This code creates a new keyword to the keywordAccountMappings:
$keywordAccountMappingBeanCollection->update(
array(
'accountId' => $accountId
),
array(
'$addToSet' => array(
'keywords' => array(
'keyword' => new \MongoId($keywordId),
'createdAt' => new \MongoDate(time())
)
)
),
array(
'upsert' => true,
'safe' => true
)
};

we also have an mms account assigned with my user account.



 Comments   
Comment by Kristina Chodorow (Inactive) [ 07/May/12 ]

@Justin: that is a different error. It looks like corruption.

Comment by Justin Patrin [ 02/May/12 ]

I'm seeing this same error, but this is on existing secondaries, not a new server. When I try deleting the data directory for a full resync I get an assertion error:

Wed May 2 15:12:01 [rsSync] Assertion failure database db/pdfile.h 488
0x57a636 0x5854eb 0x755ff2 0x83914b 0x826d38 0x82ea40 0x830333 0x829641 0x8296fa 0x829b80 0xaab7c0 0x7f56900e99ca 0x7f568f69870d
/usr/bin/mongod(ZN5mongo12sayDbContextEPKc+0x96) [0x57a636] /usr/bin/mongod(_ZN5mongo8assertedEPKcS1_j+0xfb) [0x5854eb] /usr/bin/mongod(_ZN5mongo9nsdetailsEPKc+0x2b2) [0x755ff2] /usr/bin/mongod(_ZN5mongo11shouldRetryERKNS_7BSONObjERKSs+0x11b) [0x83914b] /usr/bin/mongod(_ZN5mongo11ReplSetImpl27initialSyncOplogApplicationEPKNS_6MemberENS_6OpTimeES4+0x1548) [0x826d38]
/usr/bin/mongod(_ZN5mongo11ReplSetImpl18_syncDoInitialSyncEv+0x1180) [0x82ea40]
/usr/bin/mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x23) [0x830333]
/usr/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x61) [0x829641]
/usr/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x4a) [0x8296fa]
/usr/bin/mongod(_ZN5mongo15startSyncThreadEv+0xa0) [0x829b80]
/usr/bin/mongod(thread_proxy+0x80) [0xaab7c0]
/lib/libpthread.so.0(+0x69ca) [0x7f56900e99ca]
/lib/libc.so.6(clone+0x6d) [0x7f568f69870d]
Wed May 2 15:12:01 [rsSync] replSet initial sync failing, error applying oplog 0 assertion db/pdfile.h:488
Wed May 2 15:12:01 [rsSync] replSet initial sync failed during applyoplog

Comment by Kristina Chodorow (Inactive) [ 13/Feb/12 ]

I don't think this can be backported, unfortunately.

Comment by Dominik Gehl [ 23/Jan/12 ]

Just happend to me too ... so now starting a new initial sync (of approx 1TB of data)

Comment by Kristina Chodorow (Inactive) [ 21/Nov/11 ]

@Steffen: not sure. I'm doing some refactoring to make this more testable, so I'm not sure if we'll be able to backport. We'll try!

Comment by Steffen [ 17/Nov/11 ]

Is it possible to make a backport to 2.0.x?

Comment by auto [ 16/Nov/11 ]

Author:

{u'login': u'kchodorow', u'name': u'Kristina', u'email': u'kristina@10gen.com'}

Message: Do not leave initial sync until minvalid is reached SERVER-4294
Branch: master
https://github.com/mongodb/mongo/commit/d54adbc4c279f3b5a5a30fd95bcf8b529359bf8d

Comment by Kristina Chodorow (Inactive) [ 16/Nov/11 ]

Thanks for the logs!

It looks like you guys hit an edge case we're not handling correctly with initial sync.

To get around this for now, you could:

  • Try again - the edge case depends on what operations happen to be occurring during initial sync, so it'll probably just work if you clear the data directory and try again.
  • Start this node from a backup of another member.
  • Wait for this bug to be fixed.
Comment by Eliot Horowitz (Inactive) [ 16/Nov/11 ]

Can you attach full logs from both primary and new secondary?

Generated at Thu Feb 08 03:05:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.