[SERVER-23688] Duplicate key index crashing nodes. Created: 13/Apr/16  Updated: 12/Jan/18  Resolved: 02/Nov/17

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Replication, WiredTiger
Affects Version/s: 3.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Lucas Assignee: Eric Milkie
Resolution: Incomplete Votes: 13
Labels: NR, uic
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File investigation_cards_5087532240863232_5661405000761344.stats    
Issue Links:
Related
Operating System: ALL
Steps To Reproduce:

Change the priority of replica set member to force the election to primary.

Participants:
Case:
Linked BF Score: 7

 Description   

I have a problem with duplicated keys on a replica.
My replica is made by 1 primary (A), 1 secondary (B) and 1 arbiter.

The parameter "WriteConcern.REPLICA_ACKNOWLEDGED" is used in all insertion and update operations in those machines to keep both nodes synchronized and most operations are made in bulk.

There are around 500 collection in the database. 250 of the have an unique index on "pid, channel" and the other 250 have an unique index on "uid, channel".

Today I needed to chance the machines' priority to set the "most idle" one as the primary. I've simply changed the priority of machine B to, in the next election, become primary.

A short while after the election, the machine B - which was now primary - detected the following problem:

2016-04-13T09:40:21.966-0500 F STORAGE [conn372] Unique index cursor seeing multiple records for key { : "478713872142575", : "Facebook" }
2016-04-13T09:40:21.999-0500 I - [conn372] Fatal Assertion 28608
2016-04-13T09:40:21.999-0500 I - [conn372]

***aborting after fassert() failure

2016-04-13T09:40:22.702-0500 F - [conn372] Got signal: 6 (Aborted).

As soon as this happened, the machine A became the primary one again. A short while later, some of the following initializations of the machine B had the same problem with a different "key". 40 minutes later the duplicate error happened in the machine A too.

This BUG has already happened in a similar way and it was solved by reindexing all unique indexes in the collections, but now it happened again and I'm concerned about using mongo on production. What can cause this type of error? Why MongoDB itself can't stay up and try to solve it instead of simply abort? Can it be the usage of WriteConcern method with REPLICA_ACKNOWLEDGED the cause of this problem? The priority change plus the election process can be the cause?



 Comments   
Comment by Eric Milkie [ 02/Nov/17 ]

Please reopen this ticket if the problem recurs.

Comment by Lucas [ 01/Nov/17 ]

Unfortunately I don't
Sorry, if this happen again I will ask for all logging files.

Comment by Eric Milkie [ 27/Oct/17 ]

Hi lucasoares do you have at least the server log from the time period prior to and after the electric failure? That might give me enough of a chance to diagnose.
Priorities and network problems should not result in this problem, so I'd like to be able to get to the bottom of it.

Comment by Lucas [ 25/Oct/17 ]

Hello. I had this problem again with other replicaset (version 3.4.7). This time I think the cause was a eletric failure on one of my server providers. Due to the eletrical issue they lost network devices and then the MongoDB process couldn't communicate to other replicas.

This replicaset have 7 members but the primary (the one that had the issue) was configured with priority 2, to be allways the primary when possible. Maybe this priority configuration and the problem with network to communicate with secondaries can explain the duplication problem.

I don't remember and I'm pretty sure we hadn't any network issue in those previous servers I listed in this issue, but maybe this happened.

Basically when the server network was solved, due to the high priority this node become primary really fast, and maybe for some reason the server doesn't execute the rollback with success, and then the MongoDB replicate the same insert operation again and every time after that this node become primary, in the first insertion operation he receive he crashes.

I'm not saying this the cause, but maybe can help you guys to replicate this issue.
Unfortunately I can't upload any data and help to debug because I don't have more access to the server data files.

Note: This replicaset and the other one in this issue have nothing in common, just the MongoDB java driver, but even so, with different versions (now the java driver are in the latest version).

Thanks.

Comment by Eric Milkie [ 28/Jun/17 ]

Unfortunately, during a series of debugging rounds with the original reporter for this ticket, the instances have stopped reproducing the problem. I'm still keen on determining the source of this issue, so if anyone has a system where they suspect this problem is still actively occurring, please post here (or send me a DM). I recognize that this problem has dragged on for a while with no resolution so far, and I ask for your patience.

Comment by Meni Livne [ 16/Jan/17 ]

This happened in our deployment too with version 3.2.11. We have 4 shards, each having a primary, secondary and arbiter. The collection "domains" has more than 25 million documents and a unique index: {"domain": 1}.

For months there haven't been any problems, then suddenly today a query for a certain key
db.domains.findOne({ "domain": "some_domain.com" }),
caused the same crash.

We worked around this by dropping the index, then we found that there were two documents with the key. We removed one of them, and recreated the unique index.

We didn't upgrade mongod recently, and we've been using wiredTiger for over a year without problems. There was no primary<->secondary switch for almost a day before. The two documents in question have apparently been saved several months ago, and haven't been updated since. However, the primary and secondary servers in these shards were both replaced about 2 months ago, with a full resync from previous servers.

Comment by Lucas [ 20/Dec/16 ]

Hello anonymous.user.

We have dozens applications using bulk-update in this ReplicaSet. Some of then with upsert true.
They use either "pid" and "channel" for investigation_cards and "uid" and "channel" for investigation_users.
The explanation for "uid" and "channel" being null is: application bug. We fix that when we saw, but one application was using bulk-update with missing unique fields and with upsert true.

We still having this issue in two clusters (both came from the same base dataset) and we have another 3 new clusters without this problem, maybe it was a "bug" in our application and MongoDB couldn't handle..

About "came from same base dataset": firstly we had only one cluster. This cluster was getting bigger and bigger and we had to migrate to another one. We think at the moment that copy all dataset and drop unecessary collections was the best way to do this. This is the why both clusters having this issue came from same dataset.

Cheers!

Comment by Kelsey Schubert [ 20/Dec/16 ]

Hi lucasoares

We've taken another look at the data files you've uploaded, and identified older versions of the offending documents in the free space of the collection file. The older versions of these documents do not violate the unique index, and were subsequently updated at different times to break the index's uniqueness constraint. Since the original documents did not violate the unique index, a race condition in insert path would not explain this issue. We're currently focusing our attention on the update path and replication behavior during failover.

  • To help us tune our reproduction scripts, would you explain a bit more about your application logic? Does setting the channel and uid to null have special meaning? Are you aware of the update command that is being executed to effect this change?

sickevil and lucasoares,

I'm sorry that we haven't been able to determine the root cause yet.

When you next encounter this issue, would you please upload some additional files to help us determine if this issue is related to failovers. Would you please provide the following?

  1. An archive of the rollback directory for each node?
  2. The complete logs for each node in the replica set?
  3. The same files as requested before,
  • the affected collection file
  • its associated indexes
  • All WiredTiger* files
  • _mdb_catalog.wt
  • sizeStorer.wt

As a reminder, to determine the file names of the collection and its associated indexes. Please run the following command on the affected collection and examine the "uri" fields:

db.collection.stats({indexDetails:true})

Finally, please note that I've created a new upload portal for you to use.

Thank you for your help,
Thomas

Comment by Aleksandr Matuzok [ 20/Dec/16 ]

Hello there.

I have the same issue in production. My replica set is: prim, secondary, secondary. Without special settings. I replaced almost all unique indexes to non-unique ones. Now MongoDB crashes rarely but still crashes. I guess because of _id and rest of unique indexes. I have no luck with catch the bug. If I find something useful, I'll let you know.

Mongo 3.2.9, Debian wheezy.

Comment by Michael Liu [ 23/Nov/16 ]

Hi Thomas Schubert.

We have found the root cause in my previous scenes that duplicated index assert crashed. there has a single ReplicaSet consist of three nodes(primary,secondary,hidden). we migrate data set from another ReplicaSet with mongooplog tool. It sync oplogs using applyOps command with may cause the destination ReplicaSet oplog missing SERVER-19768. after that we got a oplog missing primary node(doc is already in collection without definitely been written to oplog). even can't be fetched and replayed to all secondary. such as :

we have an uniq index on collection c named uq_idx_name:

  • node_1(primary) : {_id:"a", name:"n1"}, {_id:"b", name:"n2"}, {_id:"c", name:"n3"}, {_id:"d", name:"n4"},
  • node_2(secondary) : {_id:"a", name:"n1"}, {_id:"b", name:"n2"}, {_id:"d", name:"n4"} (without c. caused by applyOps problem)
  • node_3(hidden) : {_id:"a", name:"n1"}, {_id:"b", name:"n2"}, {_id:"d", name:"n4"} (without c. caused by applyOps problem)

And then primary node is changing. node_2 is becoming primary and node_1 degrade to secondary. In mongo, secondary replWorker do sync with primary has multi-thread and disable the uniq index violation. imagine that we could up-insert the doc {_id:"e", name:"n3"} successfully on primary and secondary(node_1 for now) could also sync it from primary (even disable uniq index conflict). if we read from
secondary(node_1) and then got the fassert that we discuss above.

Comment by Lucas [ 14/Nov/16 ]

anonymous.user I found a really really weird thing related to this issue..

Look:

SEndpoint2:PRIMARY> db.investigation_users_5862119501201408_5543584509984768.find({id : "57e946692cdc06edc3ba7220"}, {id : 1, uid : 1, channel : 1, _id : 1}).pretty()
{
        "_id" : ObjectId("581104502cdccc5c64955cb9"),
        "id" : "57e946692cdc06edc3ba7220",
        "uid" : "3342316876",
        "channel" : "Twitter"
}
{
        "_id" : ObjectId("581797272cdce10c68bb28ba"),
        "id" : "57e946692cdc06edc3ba7220",
        "uid" : "3342316876",
        "channel" : "Twitter"
}

But when I query these documents by _id, only one of them are found:

SEndpoint2:PRIMARY> db.investigation_users_5862119501201408_5543584509984768.find({_id : ObjectId("581797272cdce10c68bb28ba")}, {id : 1, uid : 1, channel : 1, _id : 1}).pretty()
{
        "_id" : ObjectId("581797272cdce10c68bb28ba"),
        "id" : "57e946692cdc06edc3ba7220",
        "uid" : "3342316876",
        "channel" : "Twitter"
}
SEndpoint2:PRIMARY> db.investigation_users_5862119501201408_5543584509984768.find({_id : ObjectId("581104502cdccc5c64955cb9")}, {id : 1, uid : 1, channel : 1, _id : 1}).pretty()
SEndpoint2:PRIMARY>

Another collection and what I did to "fix" the duplication:

To find duplications:

db.investigation_users_5511248724623360_5648902543900672.aggregate({$group : {_id : {uid : "$uid", channel : "$channel"}, ids : {$addToSet : "$_id"},count : {$sum : 1}}}, {$match : {count : {$gt : 1}}})

Response:

{ "_id" : { "uid" : "982013415240664", "channel" : "FacebookComments" }, "ids" : [ ObjectId("581b46ed2cdc1dbae74f89e8"), ObjectId("5811047a2cdc6d22b31487d5") ], "count" : 2 }
{ "_id" : { "uid" : "883304841801092", "channel" : "FacebookComments" }, "ids" : [ ObjectId("581b46ed2cdc1dbae74f89e7"), ObjectId("5811047c2cdc6d22b31487e4") ], "count" : 2 }

And then I removed the unique index:

db.investigation_users_5511248724623360_5648902543900672.dropIndex({uid : 1, channel : 1})

I tried to remove and only one object was removed:

db.investigation_users_5511248724623360_5648902543900672.remove({ "uid" : "982013415240664", "channel" : "FacebookComments" })
WriteResult({ "nRemoved" : 1 })
SEndpoint2:PRIMARY> db.investigation_users_5511248724623360_5648902543900672.remove({ "uid" : "883304841801092", "channel" : "FacebookComments" })
WriteResult({ "nRemoved" : 1 })

And created index again:

db.investigation_users_5511248724623360_5648902543900672.ensureIndex({uid : 1, channel : 1}, {unique : true})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 8,
        "numIndexesAfter" : 9,
        "ok" : 1
}

I don't know if I can only use reIndex or repair to remake these indexes. I tried that one time and the MongoDB crash alerting duplications. Any sugestions?

Comment by Lucas [ 14/Nov/16 ]

The biggest problem in my case isn't the duplications, but the fact that MongoDB crash when these indexes are accessed. This should not happen. This is very critical in my enviroment. I really don't care for duplications right now, I just want that MongoDB don't crash and then, in a second time, seek a solution to these duplications.

Comment by Lucas [ 11/Nov/16 ]

Hello anonymous.user.

We are currently using the version 3.2.4 of MongoDB. We can't upgrade because 3.2.5+ in our enviroment are causing a lot of performance issues (SERVER-24775). All running in Ubuntu 14.04.5 LTS.

What I can do to help you in this issue (or both issues haha)?

Comment by Kelsey Schubert [ 11/Nov/16 ]

Hi All,

I wanted to provide you with an update to our investigation. We have examined the corrupted index and identified that there are indeed two records for the first key, (null, null). We're continuing to examine the data files for additional clues.

In addition, we have created a custom build on top of MongoDB 3.2.10 that adds additional logging. This will log messages when we allow duplicates into a unique index and will help us get a better understanding of what is happening. We have completed testing on this build and are confident that it can be run safely in production. If you would like to a copy of the binary to help provide us with additional diagnostic information, please let us know which platform you require and we will provide the appropriate binary.

Here is the complete patch:

src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp
diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp
index e5c4789..63f3efa 100644
--- a/src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp
+++ b/src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp
@@ -590,6 +590,9 @@ public:
                 return _idx->dupKeyError(newKey);
             }
 
+	    log() << "WiredTigerIndex::UniqueBulkBuilder::addKey -- adding a dup key ( "
+	          << _key.toString() << ").";
+
             // If we get here, we are in the weird mode where dups are allowed on a unique
             // index, so add ourselves to the list of duplicate ids. This also replaces the
             // _key which is correct since any dups seen later are likely to be newer.
@@ -1054,6 +1057,10 @@ Status WiredTigerIndexUnique::_insert(WT_CURSOR* c,
     if (!dupsAllowed)
         return dupKeyError(key);
 
+    log() << "WiredTigerIndexUnique::_insert -- adding a dup key ( "
+          << data.toString() << ").";
+    printStackTrace();
+
     if (!insertedId) {
         // This id is higher than all currently in the index for this key
         value.appendRecordId(id);

If you have a reproduction or a testing enviroment that has hit this issue, we could then provide a different build that logs additional information, but is not safe for production.

Thank you for your help,
Thomas

Comment by Michael Liu [ 03/Nov/16 ]

Hi Thomas Schubert,

We have a duplicated indexing problem is analogous to Lucas encountered. There has a single big collection over 100 million docs. when we send a query with unique index key filter. the mongod process crashed and report a message like "Unique index cursor seeing multiple records for key { : 62869212, : 301, : "d6c651ddcd97183b2e40bc464231c962" }". the full output is :

mongodb/logs/mongod.log-2016-11-02T00:06:35.335+0800 F STORAGE  [conn2506] +Unique index cursor seeing multiple records for key { : 62869212, : 301, : "d6c651ddcd97183b2e40bc464231c962" }+
mongodb/logs/mongod.log-2016-11-02T00:06:35.336+0800 I -        [conn2506] Fatal Assertion 28608 
mongodb/logs/mongod.log-2016-11-02T00:06:35.336+0800 I -        [conn2506] 
mongodb/logs/mongod.log-
mongodb/logs/mongod.log:***aborting after fassert() failure
mongodb/logs/mongod.log-
mongodb/logs/mongod.log-
mongodb/logs/mongod.log-2016-11-02T00:06:35.343+0800 F -        [conn2506] Got signal: 6 (Aborted).
mongodb/logs/mongod.log-
mongodb/logs/mongod.log- 0x132b6f2 0x132a629 0x132ae32 0x7f505a8587e0 0x7f505a4e7625 0x7f505a4e8e05 0x12bdfb2 0x1092951 0x1096ee9 0x10971c9 0xc0280c 0xbf6a91 0xc21876 0xc1d719 0xe3a19d 0xe3a7eb 0xb48a95 0xbcda73 0xbce8e4 0xb2678f 0xcdbb05 0xcde9a6 0x9b1e3c 0x12e55cf 0x7f505a850aa1 0x7f505a59d93d
mongodb/logs/mongod.log------ BEGIN BACKTRACE ----- 
mongodb/logs/mongod.log:{"backtrace":[{"b":"400000","o":"F2B6F2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F2A629"},{"b":"400000","o":"F2AE32"},{"b":"7F505A849000","o":"F7E0"},{"b":"7F505A4B5000","o":"32625","s":"gsignal"},{"b":"7F505A4B5000","o":"33E05","s":"abort"},{"b":"400000","o":"EBDFB2","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"C92951"},{"b":"400000","o":"C96EE9"},{"b":"400000","o":"C971C9"},{"b":"400000","o":"80280C","s":"_ZN5mongo9IndexScan4workEPm"},{"b":"400000","o":"7F6A91","s":"_ZN5mongo10FetchStage4workEPm"},{"b":"400000","o":"821876","s":"_ZN5mongo21SortKeyGeneratorStage4workEPm"},{"b":"400000","o":"81D719","s":"_ZN5mongo9SortStage4workEPm"},{"b":"400000","o":"A3A19D","s":"_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE"},{"b":"400000","o":"A3A7EB","s":"_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE"},{"b":"400000","o":"748A95","s":"_ZN5mongo7FindCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderE"},{"b":"400000","o":"7CDA73","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"400000","o":"7CE8E4","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"400000","o":"72678F","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"400000","o":"8DBB05"},{"b":"400000","o":"8DE9A6","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"400000","o":"5B1E3C","s":"_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE"},{"b":"400000","o":"EE55CF","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7F505A849000","o":"7AA1"},{"b":"7F505A4B5000","o":"E893D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.9", "gitVersion" : "22ec9e93b40c85fc7cae7d56e7d6a02fd811088c", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "2.6.32-358.23.2.ali1195.el6.x86_64", "version" : "#1 SMP Wed Oct 29 20:23:37 CST 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFFA20FF000", "elfType" : 3, "buildId" : "64BEF22A9E37EEE1F8FC59AB5AE35209E49D47EA" }, { "b" : "7F505AEEE000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "BCA23A35FA002CA6FC960E2794BCA3AE9F5BD371" }, { "b" : "7F505ACEA000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "F9230CF0BAC57ADEF2764E4F59E8556753907D31" }, { "b" : "7F505AA66000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "C919E2ECEAE285422EC98BF3EB8C089E8DE9F1AA" }, { "b" : "7F505A849000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "0E51B5AD014B2F88B369C39264F02DE3AFC38A63" }, { "b" : "7F505A4B5000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "6F269DCAFA81F126A3671240850D8EE9A5AF543A" }, { "b" : "7F505B0F6000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "2946077BEE84E65DD56C7732F2D89F536DBE0966" } ] }}
mongodb/logs/mongod.log- mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x132b6f2]
mongodb/logs/mongod.log- mongod(+0xF2A629) [0x132a629]
mongodb/logs/mongod.log- mongod(+0xF2AE32) [0x132ae32]
mongodb/logs/mongod.log- libpthread.so.0(+0xF7E0) [0x7f505a8587e0]
mongodb/logs/mongod.log- libc.so.6(gsignal+0x35) [0x7f505a4e7625]
mongodb/logs/mongod.log- libc.so.6(abort+0x175) [0x7f505a4e8e05]
mongodb/logs/mongod.log: mongod(_ZN5mongo13fassertFailedEi+0x82) [0x12bdfb2]
mongodb/logs/mongod.log- mongod(+0xC92951) [0x1092951]
mongodb/logs/mongod.log- mongod(+0xC96EE9) [0x1096ee9]
mongodb/logs/mongod.log- mongod(+0xC971C9) [0x10971c9]
mongodb/logs/mongod.log- mongod(_ZN5mongo9IndexScan4workEPm+0xBC) [0xc0280c]
mongodb/logs/mongod.log- mongod(_ZN5mongo10FetchStage4workEPm+0x161) [0xbf6a91]
mongodb/logs/mongod.log- mongod(_ZN5mongo21SortKeyGeneratorStage4workEPm+0x46) [0xc21876]
mongodb/logs/mongod.log- mongod(_ZN5mongo9SortStage4workEPm+0x239) [0xc1d719]
mongodb/logs/mongod.log- mongod(_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x24D) [0xe3a19d]
mongodb/logs/mongod.log- mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x3B) [0xe3a7eb]
mongodb/logs/mongod.log- mongod(_ZN5mongo7FindCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderE+0xF95) [0xb48a95]
mongodb/logs/mongod.log- mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x463) [0xbcda73]
mongodb/logs/mongod.log- mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0x404) [0xbce8e4]
mongodb/logs/mongod.log- mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x20F) [0xb2678f]
mongodb/logs/mongod.log- mongod(+0x8DBB05) [0xcdbb05]
mongodb/logs/mongod.log- mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x6C6) [0xcde9a6]
mongodb/logs/mongod.log- mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0xEC) [0x9b1e3c]
mongodb/logs/mongod.log- mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x2FF) [0x12e55cf]
mongodb/logs/mongod.log- libpthread.so.0(+0x7AA1) [0x7f505a850aa1]
mongodb/logs/mongod.log- libc.so.6(clone+0x6D) [0x7f505a59d93d]
mongodb/logs/mongod.log------  END BACKTRACE  ----- 

we are running with 3.2.9 version of MongoDB and it is definitely trigger fassert() and crash at mongo::IndexScan function.

our uniq index db.xxx.getIndexes() shown as follows :

PRIMARY> db.sync_item.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "sns.sync_item"
        },
        {
                "v" : 1,
                "unique" : true,
                "key" : {
                        "uid" : 1,
                        "type" : 1,
                        "data_id" : 1
                },
                "name" : "uq_uid_type_data_id",
                "ns" : "sns.sync_item"
        }
]

I don’t reIndex with uid_type_data_id (index building is taking too much time). and without a thorough initial sync the replicaset yet since it crash or after reboot. after we simply remove the spefitic doc => "_id" : ObjectId("5813dec19af5d35934090ff2"). the node has work well and the problem is never emerge when query with doc's _id or with doc's _uid_type_data_id. The related collections *.wt files are too big to upload and you can tell me is there something like configuration or meta files I can supply. any reply is greatly appreciated.

Thanks for your help

Comment by Kelsey Schubert [ 29/Oct/16 ]

Hi lucasoares,

Thank you for the additional uploads. We are investigating and will update you with our findings.

Kind regards,
Thomas

Comment by Lucas [ 29/Oct/16 ]

anonymous.user Sorry for that. Can you check again?

For stats: "The entered text is too long. It exceeds the allowed limit of 32767 characters".

I will upload a file with the content. There is a lot of indexes haha

Comment by Kelsey Schubert [ 29/Oct/16 ]

Hi lucasoares,

Thank you for taking the time to upload the files. The md5 checksum matches, 2822635011c9209999e7f066c7efe212, but I do not see collection-242--6781641882459149205.wt. All other listed files are present.

Would you please upload the collection file again?

Also, it would speed our investigation if you could provide the output of

db.investigation_cards_5087532240863232_5661405000761344.stats({indexDetails:true})

Thank you again for your help,
Thomas

Comment by Lucas [ 28/Oct/16 ]

anonymous.user I uploaded files of this collection: investigation_cards_5087532240863232_5661405000761344. I found a small one with duplications.

Checksum: 2822635011c9209999e7f066c7efe212 mongo.tgz

Can you confirm to me?

Files:

WiredTiger
WiredTiger.basecfg
WiredTigerLAS.wt
WiredTiger.lock
WiredTiger.turtle
WiredTiger.wt
_mdb_catalog.wt
sizeStorer.wt
collection-242--6781641882459149205.wt
index-243--6781641882459149205.wt
index-244--6781641882459149205.wt
index-728--6781641882459149205.wt
index-730--6781641882459149205.wt
index-731--6781641882459149205.wt
index-732--6781641882459149205.wt
index-733--6781641882459149205.wt
index-734--6781641882459149205.wt
index-735--6781641882459149205.wt
index-736--6781641882459149205.wt
index-737--6781641882459149205.wt
index-738--6781641882459149205.wt
index-740--6781641882459149205.wt
index-741--6781641882459149205.wt
index-742--6781641882459149205.wt
index-743--6781641882459149205.wt
index-744--6781641882459149205.wt
index-745--6781641882459149205.wt
index-746--6781641882459149205.wt
index-747--6781641882459149205.wt
index-748--6781641882459149205.wt
index-749--6781641882459149205.wt
index-750--6781641882459149205.wt
index-751--6781641882459149205.wt
index-752--6781641882459149205.wt
index-753--6781641882459149205.wt
index-754--6781641882459149205.wt
index-756--6781641882459149205.wt
index-757--6781641882459149205.wt
index-758--6781641882459149205.wt
index-762--6781641882459149205.wt
index-763--6781641882459149205.wt
index-764--6781641882459149205.wt
index-765--6781641882459149205.wt
index-766--6781641882459149205.wt
index-53--1891373956359868613.wt
index-3238--2230527203401138660.wt
index-18891-7177537344359697291.wt
index-18892-7177537344359697291.wt
index-18893-7177537344359697291.wt
index-19558-7177537344359697291.wt
index-19559-7177537344359697291.wt
index-19560-7177537344359697291.wt
index-23224-7177537344359697291.wt
index-23225-7177537344359697291.wt
index-23806-7177537344359697291.wt
index-23807-7177537344359697291.wt
index-26169-7177537344359697291.wt
index-23807-7177537344359697291.wt
index-23806-7177537344359697291.wt
index-23225-7177537344359697291.wt
index-23224-7177537344359697291.wt
index-19560-7177537344359697291.wt
index-19559-7177537344359697291.wt
index-19558-7177537344359697291.wt
index-18893-7177537344359697291.wt

Unique Index:

{
	"v" : 1,
	"unique" : true,
	"key" : {
		"pid" : 1,
		"channel" : 1
	},
	"name" : "pid_1_channel_1",
	"ns" : "shipyard.investigation_cards_5087532240863232_5661405000761344"
}

Duplications:
These duplications are both with "null" value in unique fields, there is cases that were not with the null field, but a string.

{ "_id" : ObjectId("56f66fb83d1fb0d83d7073c6"), "virality_score" : 5, "assertiviness_score" : 1.3691292843616494, "afinity_score" : 10, "followers_score" : 4487, "reach_score" : 66.9850729640567, "user_score" : 0.0153954024195935, "st_updated_at" : ISODate("2016-10-12T18:04:15.568Z"), "st_normalized_updated_at" : NumberLong("201610121804") }

{ "_id" : ObjectId("56f66fb83d1fb0d83d7073c5"), "virality_score" : 6, "assertiviness_score" : 1.5535703207922877, "afinity_score" : 15, "followers_score" : 7279, "reach_score" : 85.31705573916625, "user_score" : 0.12190642698421601, "st_updated_at" : ISODate("2016-10-12T18:04:15.568Z"), "st_normalized_updated_at" : NumberLong("201610121804") }

These objects was updated at the same time (probably with an unordered bulk update). Upsert true and with error (without pid, channel fields).

Unfortunatly my MongoDB depoloyment doesn't have document validadors (New in version 3.2). And this bug isn't happening only with "null" fields.

Comment by Lucas [ 28/Oct/16 ]

RangerX I don't understand what you said.. Secondary AND Primary has this bug. Secondaries are crashing too. You just do slaveOk() and try to access unique indexes with this problem... Changing from "slave" to "master" or vice versa doesn't change anything. Update operations in secondaries (by replication) that uses this index crash the process too.

Comment by Ivan Makeev [X] [ 28/Oct/16 ]

We will (probably) workaround the problem by changing 'slave' to 'master' and vice versa. But still watching.

Comment by Lucas [ 28/Oct/16 ]

anonymous.user I'm going to stop a secondary node and then look for an affected collection and then upload the files for you, ok?
As soon as it finish I let you know.

Comment by Kelsey Schubert [ 28/Oct/16 ]

Hi lucasoares,

Thank you for following up. I've created a portal for you to use to upload files. We would just need your meta data files (All WiredTiger* files, sizeStorer.wt, and _mdb_catalog.wt) as well as the collection file and its associated index files.

Please note that there is 5GB limit on uploads to this portal, however there's an easy workaround, which is to use the split command as follows:

split -d -b 5300000000 filename.tgz part.

This will produce a series of part.XX where XX is a number; you can then upload these files via the S3 portal and we'll stitch them back together.

Once this files have been uploaded, please let us know and we will immediately begin investigating them. I understand that this bug is significantly impacting your production environment, and we appreciate the help you are providing to diagnose this issue.

Thanks again,
Thomas

Comment by Lucas [ 27/Oct/16 ]

anonymous.user Unfortunately my database is too big to upload, and all of them are in production environment. I have one that has the same problem with about 600GB. It would be possible to send it to you? Or at least the collection and index file of an affected collection..

This is happening in every MongoDB I use. Can be the driver 2.14.1? We are unable to upgrade our driver now. And we use a lot of unordered bulk insertions. Can be one of these?

Please, I really need to fix this ASAP (or at least what is causing this), because every time I try to use the unique index, my MongoDB crashes. I'm "fixing" that with an aggregation and I'm dropping all duplications, but is getting impossible to do it...

Comment by Kelsey Schubert [ 03/Oct/16 ]

Hi lucasoares,

I'm sorry that you encountered this issue again. We have been pursuing reproductions along lines that you describe, but we haven't succeeded in hitting this condition yet. We are still actively investigating this bug and modifying our reproduction scripts in an effort to hit it.

If we had data files that showed this inconsistency it may provide additional clues to help us determine what is happening here.

Kind regards,
Thomas

Comment by Lucas [ 30/Sep/16 ]

anonymous.user any update?

Unfortunately this happened in another cluster.

I think it can be multi connections (a lot of machines and a lot of operations) updating documents in unordered bulk with upsert true.

Comment by Ryan Sherlock [ 22/Sep/16 ]

I uploaded it again (there was an issue with this mornings upload) - the data that is passed up may have already been fixed so may not show the issue - I'm searching for another example. Also - it may make sense to take upload issues off thread - I'm available on the email address that I emailed you with earlier.

Comment by Kelsey Schubert [ 22/Sep/16 ]

Hi ryan.sherlock@intercom.io,

Thank you for providing a second upload. I unzipped the files and attempted to restart mongod on top of the new data files. Unfortunately, I still encounter the same error on startup, and I see that the MD5 checksum of the sizeStorer.wt file in the new upload matches the file in the previous upload.

Would you please confirm that you are able to start a mongod pointing at the directory that you've uploaded? Please be aware that providing mismatched files that originate from different dbpaths is likely to result in this type of error.

Kind regards,
Thomas

Comment by Ryan Sherlock [ 22/Sep/16 ]

I have zipped it all up again and uploaded.

Comment by Kelsey Schubert [ 21/Sep/16 ]

Hi ryan.sherlock@intercom.io,

Thank you for uploading the files. Unfortunately, when I attempted to start a mongod instance pointing that the data files you provided. I encountered unrelated corruption in sizeStorer.wt. On startup, I am seeing the following error:

2016-09-21T12:02:33.045-0400 E STORAGE  [initandlisten] WiredTiger (0) [1474473753:45620][3025:0x7fff78dca300], file:sizeStorer.wt, WT_SESSION.open_cursor: read checksum error for 4096B block at offset 155648: block header checksum of 732902963 doesn't match expected checksum of 787063967

If you have not subsequently modified this file, would you please confirm the the MD5 checksum matches what I downloaded from the S3 portal?

MD5 (sizeStorer.wt) = 66286a7f4feee89f14c1c97c8682e9ef

If the checksum does match, would you please set aside a copy of your $dbpath and attempt the upload again?

Thanks again,
Thomas

Comment by Ryan Sherlock [ 21/Sep/16 ]

I have uploaded the files. Please let me know if you need anything else.

Comment by Kelsey Schubert [ 21/Sep/16 ]

Hi ryan.sherlock@intercom.io,

Thank you for answering my questions. If you are aware of which collection is affected, then we would just need your meta data files (All WiredTiger* files, sizeStorer.wt, and _mdb_catalog.wt) as well as the collection file and its associated index files.

Depending on the size of your $dbpath, it may be simpler to provide it in its entirety and this would allow us to check whether other collections in your database have been affected.

I have provided a secure upload portal for you to use. Files uploaded to this portal are only visible to MongoDB employees and are routinely deleted after some time.

Please note that there is 5GB limit on uploads to this portal, however there's an easy workaround, which is to use the split command as follows:

split -d -b 5300000000 filename.tgz part.

This will produce a series of part.XX where XX is a number; you can then upload these files via the S3 portal and we'll stitch them back together.

Thank you again,
Thomas

Comment by Ryan Sherlock [ 20/Sep/16 ]
  • When did you upgrade from 2.6 to 3.0.12?

As we have many replica sets we have not finished the upgrade to 3.0.12 across all. The first machines that hit 3.0.12 were 7 days ago.

  • When did you upgrade from 3.0.12 to 3.2.9?

First upgrade of a busy replica set was Thursday (5 days ago)

  • Was 3.0.12 running with WiredTiger?

Yes - see upgrade path below.

  • Did the node run any other versions of MongoDB between 2.6 and 3.2.9?

Only 3.0.12

  • What was your upgrade procedure? Did you use initial sync to upgrade?

The 2.6 setup was three members one of which is an arbiter.

Add a new 3.0.12 wiredTiger member and let it sync
Verify finished and stable
Update old secondary to 3.0.12
Once ^ finished update old secondary to wiredTiger
Verify finished and stable
Perform failover on primary
Update old primary to 3.0.12
Once ^ finished update old primary to wiredTiger
Verify finished and stable
Update secondary to 3.2.9
Update secondary to 3.2.9
Failover so primary goes to a 3.2.9 secondary
Update old primary to 3.2.9

  • Would you please upload the the same log message that Ivan Makeev provided above? In particular, I would like to see the ObjectId of the key that fails so we can create a timeline for when this record was inserted.

2016-09-19T13:15:47.508+0000 F STORAGE  [conn263090] Unique index cursor seeing multiple records for key { : 21322, : "25063" }
2016-09-19T13:15:47.508+0000 I -        [conn263090] Fatal Assertion 28608
2016-09-19T13:15:47.509+0000 I -        [conn263090]
 
***aborting after fassert() failure
 
 
2016-09-19T13:15:47.515+0000 I NETWORK  [initandlisten] connection accepted from 10.0.47.134:39754 #329418 (1772 connections now open)
2016-09-19T13:15:47.522+0000 F -        [conn263090] Got signal: 6 (Aborted).
 
 0x130c562 0x130b489 0x130bc92 0x7fc9b05c0130 0x7fc9b02245d7 0x7fc9b0225cc8 0x128e9c2 0x1053251 0x10577e9 0x1057ad9 0xb95c0f 0xb94870 0xdf0015 0xdf06d9 0xdf07d5 0xad961b 0xb6ffb3 0xb70e44 0xac6d00 0xc875f5 0xc89e86 0x94fd1c 0x12b6b45 0x7fc9b05b8df5 0x7fc9b02e5bfd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F0C562","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F0B489"},{"b":"400000","o":"F0BC92"},{"b":"7FC9B05B1000","o":"F130"},{"b":"7FC9B01EF000","o":"355D7","s":"gsignal"},{"b":"7FC9B01EF000","o":"36CC8","s":"abort"},{"b":"400000","o":"E8E9C2","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"C53251"},{"b":"400000","o":"C577E9"},{"b":"400000","o":"C57AD9"},{"b":"400000","o":"795C0F","s":"_ZN5mongo9CountScan4workEPm"},{"b":"400000","o":"794870","s":"_ZN5mongo10CountStage4workEPm"},{"b":"400000","o":"9F0015","s":"_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE"},{"b":"400000","o":"9F06D9","s":"_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE"},{"b":"400000","o":"9F07D5","s":"_ZN5mongo12PlanExecutor11executePlanEv"},{"b":"400000","o":"6D961B"},{"b":"400000","o":"76FFB3","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"400000","o":"770E44","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"400000","o":"6C6D00","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"400000","o":"8875F5"},{"b":"400000","o":"889E86","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"400000","o":"54FD1C","s":"_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE"},{"b":"400000","o":"EB6B45","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7FC9B05B1000","o":"7DF5"},{"b":"7FC9B01EF000","o":"F6BFD","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.9", "gitVersion" : "22ec9e93b40c85fc7cae7d56e7d6a02fd811088c", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.14.35-28.38.amzn1.x86_64", "version" : "#1 SMP Wed Mar 11 22:50:37 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "D017C64445179EB0C6918399F82A189ABDE70903" }, { "b" : "7FFF1D87A000", "elfType" : 3, "buildId" : "8DD7A7F645E37D599573A937DBAA3E931F55DE40" }, { "b" : "7FC9B14D7000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "C9DB6FDB87F3CAAF209AEACD2EC345B152605135" }, { "b" : "7FC9B10F1000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "09CACF79947C91081F28FCCAB5AF524D51751AF0" }, { "b" : "7FC9B0EE9000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "8A6EB7521ABE0518C2803504B52F18412D69C713" }, { "b" : "7FC9B0CE5000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "7030B651A4F5D6F32A8D83F3541990E342C50E6C" }, { "b" : "7FC9B09E3000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "EB09196C8A859427FD964381CDAC80EC0C4C4296" }, { "b" : "7FC9B07CD000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "00FA2883FB47B1327397BBF167C52F51A723D013" }, { "b" : "7FC9B05B1000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "21054340E69FD5F7DDC9DECEF7951CF3F3D2E78E" }, { "b" : "7FC9B01EF000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "0EB7F7FA93C9C3E017F7257AE97415730345767B" }, { "b" : "7FC9B1744000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "F451C8F279B6860B00F835764FFBA0D54879E1C6" }, { "b" : "7FC9AFFA3000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "FF843C37C38E5BFFD57F7BCCAE05FDADC6390C8F" }, { "b" : "7FC9AFCC0000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "0BB150CC29DB5B0E039879EFC00152A75E3B00B9" }, { "b" : "7FC9AFABD000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "5C01209C5AE1B1714F19B07EB58F2A1274B69DC8" }, { "b" : "7FC9AF88B000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "1485992B0E5CDBA0A34817FC8C6A4C45E82CD1A9" }, { "b" : "7FC9AF675000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7FC9AF466000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "A75A81EC50E9E0164A12B59D9987AD61AC7576C8" }, { "b" : "7FC9AF263000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7FC9AF049000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "F586C9A0950DFD60125B2F56B0EC6D690128B1FC" }, { "b" : "7FC9AEE28000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x130c562]
 mongod(+0xF0B489) [0x130b489]
 mongod(+0xF0BC92) [0x130bc92]
 libpthread.so.0(+0xF130) [0x7fc9b05c0130]
 libc.so.6(gsignal+0x37) [0x7fc9b02245d7]
 libc.so.6(abort+0x148) [0x7fc9b0225cc8]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x128e9c2]
 mongod(+0xC53251) [0x1053251]
 mongod(+0xC577E9) [0x10577e9]
 mongod(+0xC57AD9) [0x1057ad9]
 mongod(_ZN5mongo9CountScan4workEPm+0x17F) [0xb95c0f]
 mongod(_ZN5mongo10CountStage4workEPm+0xD0) [0xb94870]
 mongod(_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x275) [0xdf0015]
 mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x39) [0xdf06d9]
 mongod(_ZN5mongo12PlanExecutor11executePlanEv+0x55) [0xdf07d5]
 mongod(+0x6D961B) [0xad961b]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x473) [0xb6ffb3]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0x404) [0xb70e44]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x1F0) [0xac6d00]
 mongod(+0x8875F5) [0xc875f5]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x696) [0xc89e86]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0xEC) [0x94fd1c]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x325) [0x12b6b45]
 libpthread.so.0(+0x7DF5) [0x7fc9b05b8df5]
 libc.so.6(clone+0x6D) [0x7fc9b02e5bfd]
-----  END BACKTRACE  -----

  • Would you be able to provide your data files to help us investigate this issue?

What data files specifically do you need?

Comment by Kelsey Schubert [ 19/Sep/16 ]

Hi ryan.sherlock@intercom.io,

Thanks for the report; we are still working on reproducing this issue locally. Would you please clarify a few details about your upgrade path?

  1. When did you upgrade from 2.6 to 3.0.12?
  2. When did you upgrade from 3.0.12 to 3.2.9?
  3. Was 3.0.12 running with WiredTiger?
  4. Did the node run any other versions of MongoDB between 2.6 and 3.2.9?
  5. What was your upgrade procedure? Did you use initial sync to upgrade?
  6. Would you please upload the the same log message that RangerX provided above? In particular, I would like to see the ObjectId of the key that fails so we can create a timeline for when this record was inserted.

For the time being, my recommendation would be to stay on 3.0.12 until the data is cleaned. This data corruption is caught by a check introduced in SERVER-17062, consequently the fatal assertions only occur on MongoDB 3.2.

  • Would you be able to provide your data files to help us investigate this issue?

If we had access to data files affected by this issue it help us debug the root cause, and we could provide a script to resolve this issue.

Thanks again,
Thomas

Comment by Ryan Sherlock [ 19/Sep/16 ]

We are also hitting the same initial issue when we moved to 3.2.9 wiredTiger from 2.6 (via 3.0.12). It has caused us considerable disruption (and rollback to 3.0.12) and we will need a solution soon. Is there a data clean tool etc... you can make available to pass through our data?

We can include more background tomorrow AM if needed.

Comment by Ivan Makeev [X] [ 12/Aug/16 ]

We have same issue after upgrade to 3.2.8 (from 3.0.8). Our configuration: 2 mongod in replicaset + arbiter. Bug happened too frequently (3-4 times for a hour) while bulk-update executed.

2016-08-12T10:31:30.739+0300 I NETWORK  [conn577] end connection 127.0.0.1:44739 (422 connections now open)
2016-08-12T10:31:30.990+0300 I COMMAND  [conn453] command analytics.$cmd command: update { update: "partnerCatalogNBO", ordered: false, updates: 1000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:95 locks:{ Global: { acquireCount: { r: 2000, w: 2000 } }, Database: { acquireCount: { w: 2000 } }, Collection: { acquireCount: { w: 1000 } }, Metadata: { acquireCount: { w: 1000 } }, oplog: { acquireCount: { w: 1000 } } } protocol:op_query 106ms
2016-08-12T10:31:31.054+0300 I COMMAND  [conn455] command analytics.$cmd command: update { update: "partnerCatalogNBO", ordered: false, updates: 1000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:95 locks:{ Global: { acquireCount: { r: 2000, w: 2000 } }, Database: { acquireCount: { w: 2000 } }, Collection: { acquireCount: { w: 1000 } }, Metadata: { acquireCount: { w: 1000 } }, oplog: { acquireCount: { w: 1000 } } } protocol:op_query 114ms
2016-08-12T10:31:31.176+0300 I COMMAND  [conn455] command analytics.$cmd command: update { update: "partnerCatalogNBO", ordered: false, updates: 1000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:95 locks:{ Global: { acquireCount: { r: 2000, w: 2000 } }, Database: { acquireCount: { w: 2000 } }, Collection: { acquireCount: { w: 1000 } }, Metadata: { acquireCount: { w: 1000 } }, oplog: { acquireCount: { w: 1000 } } } protocol:op_query 107ms
2016-08-12T10:31:31.329+0300 I COMMAND  [conn454] command analytics.$cmd command: update { update: "partnerCatalogNBO", ordered: false, updates: 1000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:95 locks:{ Global: { acquireCount: { r: 2000, w: 2000 } }, Database: { acquireCount: { w: 2000 } }, Collection: { acquireCount: { w: 1000 } }, Metadata: { acquireCount: { w: 1000 } }, oplog: { acquireCount: { w: 1000 } } } protocol:op_query 106ms
2016-08-12T10:31:31.449+0300 I COMMAND  [conn454] command analytics.$cmd command: update { update: "partnerCatalogNBO", ordered: false, updates: 1000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:95 locks:{ Global: { acquireCount: { r: 2000, w: 2000 } }, Database: { acquireCount: { w: 2000 } }, Collection: { acquireCount: { w: 1000 } }, Metadata: { acquireCount: { w: 1000 } }, oplog: { acquireCount: { w: 1000 } } } protocol:op_query 105ms
2016-08-12T10:31:31.727+0300 I COMMAND  [conn455] command analytics.$cmd command: update { update: "partnerCatalogNBO", ordered: false, updates: 1000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:95 locks:{ Global: { acquireCount: { r: 2000, w: 2000 } }, Database: { acquireCount: { w: 2000 } }, Collection: { acquireCount: { w: 1000 } }, Metadata: { acquireCount: { w: 1000 } }, oplog: { acquireCount: { w: 1000 } } } protocol:op_query 106ms
2016-08-12T10:31:31.816+0300 F STORAGE  [conn456] Unique index cursor seeing multiple records for key { : ObjectId('5252657f0d422d2814bba302'), : 20034, : 20128 }
2016-08-12T10:31:31.816+0300 I -        [conn456] Fatal Assertion 28608
2016-08-12T10:31:31.816+0300 I -        [conn456] 
 
***aborting after fassert() failure
 
 
2016-08-12T10:31:31.822+0300 I NETWORK  [conn568] end connection 192.168.105.62:39608 (421 connections now open)
2016-08-12T10:31:31.825+0300 I NETWORK  [initandlisten] connection accepted from 192.168.105.62:39770 #578 (422 connections now open)
2016-08-12T10:31:31.827+0300 F -        [conn456] Got signal: 6 (Aborted).
 
 0x1307282 0x13061a9 0x13069b2 0x7f304c1910a0 0x7f304be27125 0x7f304be2a3a0 0x128d272 0x1051ce1 0x1056279 0x1056891 0xba52c4 0xba59b3 0xb99651 0xb8fe42 0xdee402 0xdef78d 0xdefe4f 0xdb959c 0xb4fa10 0xb51ba5 0xb51f6d 0xb551d8 0xb6ef33 0xb6fdc4 0xac63a0 0xc866c5 0xc88f56 0x94f0bc 0x12b1865 0x7f304c188b50 0x7f304bed330d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F07282","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F061A9"},{"b":"400000","o":"F069B2"},{"b":"7F304C182000","o":"F0A0"},{"b":"7F304BDF5000","o":"32125","s":"gsignal"},{"b":"7F304BDF5000","o":"353A0","s":"abort"},{"b":"400000","o":"E8D272","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"C51CE1"},{"b":"400000","o":"C56279"},{"b":"400000","o":"C56891"},{"b":"400000","o":"7A52C4","s":"_ZN5mongo9IndexScan13initIndexScanEv"},{"b":"400000","o":"7A59B3","s":"_ZN5mongo9IndexScan4workEPm"},{"b":"400000","o":"799651","s":"_ZN5mongo10FetchStage4workEPm"},{"b":"400000","o":"78FE42","s":"_ZN5mongo15CachedPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE"},{"b":"400000","o":"9EE402","s":"_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyE"},{"b":"400000","o":"9EF78D","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKSsNS0_11YieldPolicyE"},{"b":"400000","o":"9EFE4F","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE"},{"b":"400000","o":"9B959C","s":"_ZN5mongo17getExecutorUpdateEPNS_16OperationContextEPNS_10CollectionEPNS_12ParsedUpdateEPNS_7OpDebugE"},{"b":"400000","o":"74FA10","s":"_ZN5mongo18WriteBatchExecutor10execUpdateERKNS_12BatchItemRefEPNS_7BSONObjEPPNS_16WriteErrorDetailE"},{"b":"400000","o":"751BA5","s":"_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE"},{"b":"400000","o":"751F6D","s":"_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE"},{"b":"400000","o":"7551D8","s":"_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderE"},{"b":"400000","o":"76EF33","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"400000","o":"76FDC4","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"400000","o":"6C63A0","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"400000","o":"8866C5"},{"b":"400000","o":"888F56","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"400000","o":"54F0BC","s":"_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE"},{"b":"400000","o":"EB1865","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7F304C182000","o":"6B50"},{"b":"7F304BDF5000","o":"DE30D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.8", "gitVersion" : "ed70e33130c977bda0024c125b56d159573dbaf0", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.2.0-4-amd64", "version" : "#1 SMP Debian 3.2.65-1+deb7u2", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "D6EAB75C3D236FDE2266260525362A73AC8CE1E5" }, { "b" : "7FFCB8E8F000", "elfType" : 3, "buildId" : "47AB69602E2C4DF603648158A887BFA90132E1D9" }, { "b" : "7F304D03A000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "C4D2661226A9BA39A674C938467943F112AB03D5" }, { "b" : "7F304CC42000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "FC83F87458157CEACE0CB274298CA90D02B767EA" }, { "b" : "7F304CA3A000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "77CB0070BB75470C626B899800FFCF8C8184E32A" }, { "b" : "7F304C836000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "32CCD52B7228B03E969C21587F14C2322D4A9544" }, { "b" : "7F304C5B4000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "98FAFAB6E776EA40FBFFFBCB22859A8D54028EC0" }, { "b" : "7F304C39E000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "F980B1188708F8D8B5C35D185444AF4CB939AA1E" }, { "b" : "7F304C182000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "F260CD217A3C0D89FD80BCC80D486C1C2AD32DCA" }, { "b" : "7F304BDF5000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "BD7458ADC60E906DBAABB72A7DA5015D5BC8269F" }, { "b" : "7F304D29A000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "B66F918D37C1E4122FC91C74210F2112A6A5148E" }, { "b" : "7F304BBDE000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "1EFEB71FD4999C2307570D673A724EA4E1D85267" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1307282]
 mongod(+0xF061A9) [0x13061a9]
 mongod(+0xF069B2) [0x13069b2]
 libpthread.so.0(+0xF0A0) [0x7f304c1910a0]
 libc.so.6(gsignal+0x35) [0x7f304be27125]
 libc.so.6(abort+0x180) [0x7f304be2a3a0]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x128d272]
 mongod(+0xC51CE1) [0x1051ce1]
 mongod(+0xC56279) [0x1056279]
 mongod(+0xC56891) [0x1056891]
 mongod(_ZN5mongo9IndexScan13initIndexScanEv+0x294) [0xba52c4]
 mongod(_ZN5mongo9IndexScan4workEPm+0x1F3) [0xba59b3]
 mongod(_ZN5mongo10FetchStage4workEPm+0x161) [0xb99651]
 mongod(_ZN5mongo15CachedPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE+0x192) [0xb8fe42]
 mongod(_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyE+0xB2) [0xdee402]
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKSsNS0_11YieldPolicyE+0x14D) [0xdef78d]
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE+0xBF) [0xdefe4f]
 mongod(_ZN5mongo17getExecutorUpdateEPNS_16OperationContextEPNS_10CollectionEPNS_12ParsedUpdateEPNS_7OpDebugE+0xA2C) [0xdb959c]
 mongod(_ZN5mongo18WriteBatchExecutor10execUpdateERKNS_12BatchItemRefEPNS_7BSONObjEPPNS_16WriteErrorDetailE+0x6A0) [0xb4fa10]
 mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE+0x2B5) [0xb51ba5]
 mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x1DD) [0xb51f6d]
 mongod(_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderE+0x248) [0xb551d8]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x473) [0xb6ef33]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0x404) [0xb6fdc4]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x1F0) [0xac63a0]
 mongod(+0x8866C5) [0xc866c5]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x696) [0xc88f56]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0xEC) [0x94f0bc]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x325) [0x12b1865]
 libpthread.so.0(+0x6B50) [0x7f304c188b50]
 libc.so.6(clone+0x6D) [0x7f304bed330d]
-----  END BACKTRACE  -----

Comment by Kelsey Schubert [ 07/Jul/16 ]

Hi lucasoares,

Thank you very much for the taking the time to compile and upload this information.

We're examining the files you have uploaded now and will update this ticket when we know more about this issue.

Kind regards,
Thomas

Comment by Lucas [ 07/Jul/16 ]

Hello anonymous.user,

After several months without any problems related to unique indexes, I came across the same problem again.

This time I was able to get all kinds of information from the affected collection and also took all information that MongoDB offers (log and diagnostics).

I packed it all and I uploaded a file called "MongoDebug.tar.bz2" in the same portal as before. The content is divided into:

primaryFiles - Files from the primary node:

collections_indexes - Collection and index files from affected collection.

diagnostics - Diagnostic files. This folder have diagnostics before (preCommandExecutions) and after (posCommandExecutions) the execution of several commands (all listed in commandExecutions folder).

logs - File with several days of mongo log.

secondaryFiles - Files from the secondary node:

diagnostics - Diagnostic files.

logs - File with several days of mongo log.

commandExecutions - Several commands executed in the affected collection of primary node in standalone and their respective outputs (as well as the corresponding server log):

getIndexes.txt - execution of getIndexes()

validateCommand.txt - execution of validate(true).

explainWithoutHint.txt - explain execution of the query that returns duplicated documents.

explainWithHint.txt - explain execution of the same query that returns duplicated documents with hint in one non-unique index.

documents.txt - documments with duplicated uid and channel (my unique index).

object_idFind.txt - Finding all duplicated documents with their _ids.

statsAndIndexDetails.txt - execution of stats({indexDetails:true})

countChannelNull.txt - count using only part of unique index (this one uses a non-unique index that starts with channel).

countWithHint.txt - count in the first field of the unique index with hint in one non-unique index.

I hope that this information can help to solve this problem asap. I can't update my MongoDB to the newest version because of a performance problem (which I already reported it on another Issue) and then, the MongoDB 3.2.4 haves this problem of possibility to save/update duplicate documents in unique indexes (and probably all later versions as well).

Other infos:

1. All my update operations in these collections (investigation_users_x_x) are made with unordered bulk updates. Probably this is one of the possible causes, but even so this behavior should not be happening.

2. I often use the property of MongoDB (from here) and apply an update operation like:

query: {uid, channel)
update: {$set : {fields (without uid and channel fields)}}
upsert: true
multi: false

Update behavior:

The fields and values of both the <query> and <update> parameters if the <update> parameter contains update operator expressions. The update creates a base document from the equality clauses in the <query> parameter, and then applies the update expressions from the <update> parameter.

This second operation is really similar to this problem and to what would happen if the creation of the document from the query fields fails.. What you think?

I dropped all duplicates and reindexed my collections. These collections had duplicate documents:

investigation_users_5064556749520896_5133639419428864 - Dups found (5).
investigation_users_5098904710283264_6242625241743360 - Dups found (1).
investigation_users_5117847663542272_4711757935280128 - Dups found (25).
investigation_users_5117847663542272_5748975256731648 - Dups found (78).
investigation_users_5158194988974080_4921474645229568 - Dups found (1).
investigation_users_5681034041491456_5094328489738240 - Dups found (1).
investigation_users_6499969481572352_5741460607467520 - Dups found (1).
investigation_users_6549984979714048_5045551083028480 - Dups found (2).

Comment by Kelsey Schubert [ 10/Jun/16 ]

Hi lucasoares,

Sorry for the silence. Unfortunately, we have not yet been able to identify the cause of this behavior. We are continuing to investigate and will let you know if we are able to successfully reproduce this issue.

Thank you again for your help,
Thomas

Comment by Lucas [ 04/May/16 ]

This happened twice but was in the same database and I think the first problem is derived from another.

When it happened for the first time, I tried to reindex all unique indexes in my data collections. This worked but after some time those errors happened again. Recently I investigated and I found some collections that I hadn't re-indexed before. Since I reindex all unique indexes of my entire database, this problem doesn't happened again.

I don't know exactly the affected collections, but when I was running my re-indexing algorithm, there were many collections with duplicated keys.

For the questions:
1. Described above.
1.1 Yes.
1.2 I didn't understand your question. MongoDB crash when some query finds a cursor with 2 or more keys with same value and that these values are in an unique index.
2. The first time I created this replica there was an initial sync, but when there was a little database. This many months ago.

Sorry, but for 3 and 4 I don't know how to provide you this information. I don't have now collections returning duplicated documents, because I manually fix all them (re-indexing). I had to fix more fast I could because this is a production database.

Comment by Kelsey Schubert [ 25/Apr/16 ]

Hi lucasoares,

My understanding is that this issue has occurred twice. Since it affects multiple collections, without manual corrective action this can result in numerous fatal assertions on different collections. Is this your understanding as well, or have there been more occurrences?

I'd like to work with you to attempt to reproduce this issue. To help, please answer the following questions:

  1. Can you provide additional information about first time occurrence of this issue (the similar bug you mention in the ticket description)?
    1. Were the symptoms the same, Fatal Assertion 28608?
    2. When did it occur?
  2. Did an initial sync precede any of these issues?
  3. Can you provide the explain output of the query that returns the duplicated documents?
  4. If you use hint to select a different index that is neither the affected unique key index nor _id, are the duplicated documents returned? Can you please provide the explain output of this query as well?

Thank you,
Thomas

Comment by Kelsey Schubert [ 20/Apr/16 ]

Hi lucasoares,

So we can conduct a post-mortem on an affected collection, can you please upload the following files to the same portal as before?

  • the affected collection file
  • its associated indexes
  • All WiredTiger* files
  • _mdb_catalog.wt
  • sizeStorer.wt

To determine the file names of the collection and its associated indexes. Please run the following command on the affected collection and examine the "uri" fields:

db.collection.stats({indexDetails:true})

Thank you for help investigating this issue,
Thomas

Comment by Lucas [ 20/Apr/16 ]

I found another collection with duplicate:

PRIMARY> db.social_links.getIndexes()
[
        {
                "ns" : "shipyard.social_links",
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_"
        },
        {
                "ns" : "shipyard.social_links",
                "v" : 1,
                "unique" : true,
                "key" : {
                        "twitter_id" : 1
                },
                "name" : "twitter_id_1",
                "sparse" : true
        },
        {
                "ns" : "shipyard.social_links",
                "v" : 1,
                "unique" : true,
                "key" : {
                        "facebook_id" : 1
                },
                "name" : "facebook_id_1",
                "sparse" : true
        },
        {
                "ns" : "shipyard.social_links",
                "v" : 1,
                "unique" : true,
                "key" : {
                        "instagram_id" : 1
                },
                "name" : "instagram_id_1",
                "sparse" : true
        },
        {
                "ns" : "shipyard.social_links",
                "v" : 1,
                "unique" : true,
                "key" : {
                        "youtube_id" : 1
                },
                "name" : "youtube_id_1",
                "sparse" : true
        },
        {
                "ns" : "shipyard.social_links",
                "v" : 1,
                "unique" : true,
                "key" : {
                        "linkedin_id" : 1
                },
                "name" : "linkedin_id_1",
                "sparse" : true
        },
        {
                "ns" : "shipyard.social_links",
                "v" : 1,
                "unique" : true,
                "key" : {
                        "blog_id" : 1
                },
                "name" : "blog_id_1",
                "sparse" : true
        },
        {
                "ns" : "shipyard.social_links",
                "v" : 1,
                "unique" : true,
                "key" : {
                        "news_id" : 1
                },
                "name" : "news_id_1",
                "sparse" : true
        }
]

Validation output:

PRIMARY> db.social_links.validate(true)
{
        "ns" : "shipyard.social_links",
        "nrecords" : 258,
        "nIndexes" : 8,
        "keysPerIndex" : {
                "shipyard.social_links.$_id_" : 258,
                "shipyard.social_links.$twitter_id_1" : 195,
                "shipyard.social_links.$facebook_id_1" : 114,
                "shipyard.social_links.$instagram_id_1" : 71,
                "shipyard.social_links.$youtube_id_1" : 0,
                "shipyard.social_links.$linkedin_id_1" : 0,
                "shipyard.social_links.$blog_id_1" : 0,
                "shipyard.social_links.$news_id_1" : 0
        },
        "indexDetails" : {
                "shipyard.social_links.$_id_" : {
                        "valid" : true
                },
                "shipyard.social_links.$twitter_id_1" : {
                        "valid" : true
                },
                "shipyard.social_links.$facebook_id_1" : {
                        "valid" : true
                },
                "shipyard.social_links.$instagram_id_1" : {
                        "valid" : true
                },
                "shipyard.social_links.$youtube_id_1" : {
                        "valid" : true
                },
                "shipyard.social_links.$linkedin_id_1" : {
                        "valid" : true
                },
                "shipyard.social_links.$blog_id_1" : {
                        "valid" : true
                },
                "shipyard.social_links.$news_id_1" : {
                        "valid" : true
                }
        },
        "valid" : true,
        "errors" : [ ],
        "ok" : 1
}

Comment by Lucas [ 20/Apr/16 ]

anonymous.user I found another collection with duplicate data but isn't the same collection type. This one has only 3 fields in each document: uid, channel and "last_update". Look:

PRIMARY> db.influencer_updates.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "shipyard.influencer_updates"
        },
        {
                "v" : 1,
                "unique" : true,
                "key" : {
                        "uid" : 1,
                        "channel" : 1
                },
                "name" : "uid_1_channel_1",
                "ns" : "shipyard.influencer_updates"
        }
]

This collection is updated only by one machine with no bulk operations (only a single update type). This update has upsert true. The update query contains only "uid" and does not have the "channel" field.

Unfortunately I dropped all duplicates before seen your comment about running collection.validate(true). I will show you here the output after removing those duplicates but I will look my entire database for another collection with any duplication to run the validation again:

{
        "ns" : "shipyard.influencer_updates",
        "nrecords" : 193804,
        "nIndexes" : 2,
        "keysPerIndex" : {
                "shipyard.influencer_updates.$_id_" : 193804,
                "shipyard.influencer_updates.$uid_1_channel_1" : 193804
        },
        "indexDetails" : {
                "shipyard.influencer_updates.$_id_" : {
                        "valid" : true
                },
                "shipyard.influencer_updates.$uid_1_channel_1" : {
                        "valid" : true
                }
        },
        "valid" : true,
        "errors" : [ ],
        "ok" : 1
}

Thanks and as soon I find a collection with duplicates I will run the validation command for you.

Comment by Kelsey Schubert [ 19/Apr/16 ]

A1exander, thank you for reporting that you are experiencing a similar issue. So we can continue to investigate the behavior you are observing, can you please open a new ticket? When you create a new ticket, please describe the history of your data files as Lucas did above and attach the complete logs for the affected files to the ticket. If you need a portal for your logs, please let me know and I will create one for you to use.

lucasoares, thank you for answering the questions above. We are still investigating this issue: can you please provide the output of db.collection.validate(true) on an affected collection? Please note that this command may impact the performance of your MongoDB instance.

Thanks again,
Thomas

Comment by Lucas [ 19/Apr/16 ]

This can be related to MongoDB update + multikey unique indexes? My servers are also constantly updated to recent/"stable" versions of MongoDB.

Comment by Alexander Markov [X] [ 19/Apr/16 ]

Hi! I faced with the same problem after updating mongo to 3.2.4 (from 3.0.6).

2016-04-12T08:23:38.158+0000 F STORAGE  [conn1322] Unique index cursor seeing multiple records for key { : 12519, : 48419, : 4, : 1146794 }
2016-04-12T08:23:38.160+0000 I -        [conn1322] Fatal Assertion 28608
2016-04-12T08:23:38.161+0000 I -        [conn1322]
 
***aborting after fassert() failure
 
 
2016-04-12T08:23:38.224+0000 F -        [conn1322] Got signal: 6 (Aborted).
 
 0x12f3502 0x12f2659 0x12f2e62 0x7f16b53bd340 0x7f16b501ecc9 0x7f16b50220d8 0x127d9d2 0x10578e1 0x105b919 0x105bc09 0xbdc509 0xbd0461 0xbc6d52 0xe0a872 0xe0bbf8 0xe0c29f 0xdd3ef6 0xdd49bb 0xb2713e 0xba6df3 0xba7c84 0xb04790 0xcb3cb5 0x99974c 0x12a0ebd 0x7f16b53b5182 0x7f16b50e247d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"EF3502","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EF2659"},{"b":"400000","o":"EF2E62"},{"b":"7F16B53AD000","o":"10340"},{"b":"7F16B4FE8000","o":"36CC9","s":"gsignal"},{"b":"7F16B4FE8000","o":"3A0D8","s":"abort"},{"b":"400000","o":"E7D9D2","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"C578E1"},{"b":"400000","o":"C5B919"},{"b":"400000","o":"C5BC09"},{"b":"400000","o":"7DC509","s":"_ZN5mongo9IndexScan4workEPm"},{"b":"400000","o":"7D0461","s":"_ZN5mongo10FetchStage4workEPm"},{"b":"400000","o":"7C6D52","s":"_ZN5mongo15CachedPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE"},{"b":"400000","o":"A0A872","s":"_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyE"},{"b":"400000","o":"A0BBF8","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKSsNS0_11YieldPolicyE"},{"b":"400000","o":"A0C29F","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE"},{"b":"400000","o":"9D3EF6","s":"_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS5_EENS_12PlanExecutor11YieldPolicyEm"},{"b":"400000","o":"9D49BB","s":"_ZN5mongo15getExecutorFindEPNS_16OperationContextEPNS_10CollectionERKNS_15NamespaceStringESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS8_EENS_12PlanExecutor11YieldPolicyE"},{"b":"400000","o":"72713E","s":"_ZN5mongo7FindCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderE"},{"b":"400000","o":"7A6DF3","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"400000","o":"7A7C84","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"400000","o":"704790","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"400000","o":"8B3CB5","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"400000","o":"59974C","s":"_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE"},{"b":"400000","o":"EA0EBD","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7F16B53AD000","o":"8182"},{"b":"7F16B4FE8000","o":"FA47D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.4", "gitVersion" : "e2ee9ffcf9f5a94fad76802e28cc978718bb7a30", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-48-generic", "version" : "#80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "EF46210F8976780D45B811C3540FECB9E734EABE" }, { "b" : "7FFF9C0D2000", "elfType" : 3, "buildId" : "341443CC258F9F27709CAA36C8CB321E4BBFFF95" }, { "b" : "7F16B65D2000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "FF43D0947510134A8A494063A3C1CF3CEBB27791" }, { "b" : "7F16B61F7000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "B927879B878D90DD9FF4B15B00E7799AA8E0272F" }, { "b" : "7F16B5FEF000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7F16B5DEB000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7F16B5AE7000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "19EFDDAB11B3BF5C71570078C59F91CF6592CE9E" }, { "b" : "7F16B57E1000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7F16B55CB000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7F16B53AD000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7F16B4FE8000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7F16B6830000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x12f3502]
 mongod(+0xEF2659) [0x12f2659]
 mongod(+0xEF2E62) [0x12f2e62]
 libpthread.so.0(+0x10340) [0x7f16b53bd340]
 libc.so.6(gsignal+0x39) [0x7f16b501ecc9]
 libc.so.6(abort+0x148) [0x7f16b50220d8]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x127d9d2]
 mongod(+0xC578E1) [0x10578e1]
 mongod(+0xC5B919) [0x105b919]
 mongod(+0xC5BC09) [0x105bc09]
 mongod(_ZN5mongo9IndexScan4workEPm+0xB9) [0xbdc509]
 mongod(_ZN5mongo10FetchStage4workEPm+0x161) [0xbd0461]
 mongod(_ZN5mongo15CachedPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE+0x192) [0xbc6d52]
 mongod(_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyE+0xB2) [0xe0a872]
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKSsNS0_11YieldPolicyE+0x148) [0xe0bbf8]
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE+0xBF) [0xe0c29f]
 mongod(_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS5_EENS_12PlanExecutor11YieldPolicyEm+0x136) [0xdd3ef6]
 mongod(_ZN5mongo15getExecutorFindEPNS_16OperationContextEPNS_10CollectionERKNS_15NamespaceStringESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS8_EENS_12PlanExecutor11YieldPolicyE+0x7B) [0xdd49bb]
 mongod(_ZN5mongo7FindCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderE+0x5FE) [0xb2713e]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x473) [0xba6df3]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0x404) [0xba7c84]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x1F0) [0xb04790]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xB45) [0xcb3cb5]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0xEC) [0x99974c]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x26D) [0x12a0ebd]
 libpthread.so.0(+0x8182) [0x7f16b53b5182]
 libc.so.6(clone+0x6D) [0x7f16b50e247d]
-----  END BACKTRACE  -----

mongos> db.UserSectionMetrics.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "metrics2.UserSectionMetrics"
        },
        {
                "v" : 1,
                "unique" : true,
                "key" : {
                        "cid" : 1,
                        "uid" : 1,
                        "sid" : 1,
                        "lpsid" : 1
                },
                "name" : "ix_cid_1_uid_1_sid_1_lpsid_1",
                "ns" : "metrics2.UserSectionMetrics"
        },
        {
                "v" : 1,
                "key" : {
                        "cid" : "hashed"
                },
                "name" : "cid_hashed",
                "ns" : "metrics2.UserSectionMetrics"
        }
]

Comment by Lucas [ 18/Apr/16 ]

1.
Stop all applications that writes and reads on the database.
Drop unique index.
Aggregate all duplicates by my unique index keys and remove all them (creating a backup to restore one of each duplicate later).
Ensure index.

2.

No, this doesn't affect only particular collections.

3.

[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1
                },
                "name" : "id_1",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "normalized_posted_at" : 1
                },
                "name" : "normalized_posted_at_1",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "channel" : 1,
                        "username" : 1
                },
                "name" : "channel_1_username_1",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "has_name" : 1,
                        "is_only_universe" : 1,
                        "annotation_spam" : 1,
                        "normalized_posted_at" : -1,
                        "stilingue_array.name" : 1,
                        "stilingue_array.theme" : 1,
                        "has_theme" : 1,
                        "stilingue_array.term" : 1
                },
                "name" : "mainIndex",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "has_name" : 1,
                        "is_only_universe" : 1,
                        "annotation_spam" : 1,
                        "stilingue_array.term" : 1,
                        "normalized_posted_at" : -1
                },
                "name" : "termIndex",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "has_name" : 1,
                        "annotation_spam" : 1,
                        "is_only_universe" : 1,
                        "annotation_favorite" : 1,
                        "normalized_posted_at" : -1
                },
                "name" : "favoriteIndex",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "has_name" : 1,
                        "annotation_spam" : 1,
                        "is_only_universe" : 1,
                        "locationState" : 1,
                        "normalized_posted_at" : -1,
                        "locationName" : 1,
                        "polarity" : 1
                },
                "name" : "locationIndexes",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "locationState" : 1
                },
                "name" : "locationIndex",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528",
                "sparse" : true
        },
        {
                "v" : 1,
                "key" : {
                        "locationName" : 1
                },
                "name" : "locationName",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528",
                "sparse" : true
        },
        {
                "v" : 1,
                "key" : {
                        "uid" : 1,
                        "channel" : 1,
                        "normalized_posted_at" : -1,
                        "hot" : -1
                },
                "name" : "uidIndex",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "has_name" : 1,
                        "annotation_spam" : 1,
                        "is_only_universe" : 1,
                        "stilingue_array.theme" : 1
                },
                "name" : "statisticsThemeIndex",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "has_name" : 1,
                        "annotation_spam" : 1,
                        "is_only_universe" : 1,
                        "stilingue_array.name" : 1,
                        "normalized_posted_at" : 1
                },
                "name" : "monitoringIndex",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "has_name" : 1,
                        "annotation_spam" : 1,
                        "is_only_universe" : 1,
                        "channel" : 1,
                        "stilingue_array.name" : 1,
                        "stilingue_array.theme" : 1,
                        "normalized_posted_at" : 1
                },
                "name" : "detalhamentoChannelIndex",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "has_name" : 1,
                        "annotation_spam" : 1,
                        "normalized_posted_at" : 1,
                        "is_only_universe" : 1,
                        "type" : 1
                },
                "name" : "basicIndex",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "has_name" : 1,
                        "annotation_spam" : 1,
                        "is_only_universe" : 1,
                        "has_theme" : 1,
                        "normalized_posted_at" : 1
                },
                "name" : "analysisTheme",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "has_occupation" : 1,
                        "stilingue_array.occupation" : 1
                },
                "name" : "has_occupation_1_stilingue_array.occupation_1",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "has_name" : 1,
                        "is_only_universe" : 1,
                        "annotation_spam" : 1,
                        "normalized_posted_at" : -1,
                        "polarity" : 1
                },
                "name" : "hashtags_index",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "channel" : 1,
                        "uid" : 1,
                        "followers" : 1
                },
                "name" : "aaaUserIndex",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "key" : {
                        "has_name" : 1,
                        "annotation_spam" : 1,
                        "channel" : 1,
                        "st_updated_at" : 1
                },
                "name" : "aaaPostIndex",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        },
        {
                "v" : 1,
                "unique" : true,
                "key" : {
                        "pid" : 1,
                        "channel" : 1
                },
                "name" : "pid_1_channel_1",
                "ns" : "shipyard.investigation_cards_4933954545123328_4998063642902528"
        }
]

Those indexes are created in collection creation (with 0 docs) by java driver.

4.
Yes and not.. I think from a moment MongoDB was unable to handle these duplicates and from there all this began.

Now please read this:

I don't think anymore it was that replica primary change process that caused the system crash or the duplication of docs with same unique fields, but reading data that was already corrupted. Data corruption may have occurred with 3 machines writing in the same replica set with unordered bulk update using upsert true. What you think about this? Is this possible?

One weird thing:
When I got the duplicated docs by aggregation, I had the "_id" field of these document and was unable to search using the "_id" field but only by "pid" and "channel" (my unique index). "_id" was returning always 0 documents.

When i searched by "pid" and "channel" I saw the "_id" field that was trying to search previously (but searching with _id, 0 docs again).

What I need to do to fix this? Create a clean MongoDB and copy all data with find/insert?

Comment by Kelsey Schubert [ 18/Apr/16 ]

Hi lucasoares,

Thank you for uploading the logs; we are still investigating them. To speed the process, can you please answer the following questions?

  1. Can you please outline the steps you took to delete the duplicate documents and reindex?
  2. Have you determined if this issue only affects particular collections?
  3. Can you post the output of db.collection.getIndexes()? In particular, I would like to know whether these indexes were constructed in the background.
  4. Are you aware of whether these records with duplicated keys have been recently written or updated?

With regard to your question, an initial sync may not resolve issue since both nodes are affected.

Thank you,
Thomas

Comment by Lucas [ 18/Apr/16 ]

Initial sync can fix these duplicates?

Comment by Lucas [ 17/Apr/16 ]

I reindex (deleting all duplicated documents) my unique indexes but this error still occurring after some time.

Comment by Lucas [ 17/Apr/16 ]

Uploaded, anonymous.user.

Comment by Kelsey Schubert [ 14/Apr/16 ]

Hi lucasoares,

I've created an upload portal for you to use.

Thank you,
Thomas

Comment by Lucas [ 14/Apr/16 ]

Can you open an upload link to me? Big files and my home network doesn't help :/

Comment by Lucas [ 14/Apr/16 ]

anonymous.user This isn't the same replica set that I was debugging in SERVER-23532, but another one. This one is running by months and some of their collections has been created since the 2.6 version of MongoDB.

The strange thing is: I reindexed all unique indexes in these collections on 3.2 and this still happens.

I will upload complete log files from server A (normally the primary) and server B (normally the secundary), ok? Just wait a moment =)

Thanks..

Comment by Kelsey Schubert [ 13/Apr/16 ]

Hi lucasoares,

Thank you for reporting this issue. To get a better idea of what is happening here, please answer the following questions.

  1. Can you describe the history of these data files?
    1. Have these data files been repaired or copied from any the nodes which had failures?
    2. Have you completed a fresh resync to build these files? If so, which version of MongoDB were you running?
  2. Can you please upload the complete logs for the affected nodes?

Thank you,
Thomas

Generated at Thu Feb 08 04:04:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.