[SERVER-27443] Fatal assertion 18506 DuplicateKey E11000 duplicate key error dup key Created: 16/Dec/16  Updated: 09/Oct/17  Resolved: 15/Sep/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: davy Assignee: Kelsey Schubert
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 1617401803734vsi.png     PNG File 2.png    
Operating System: ALL
Steps To Reproduce:

Participants:

 Description   

I hava a replSet with 3 servers, there are with 300GB data, Last week, we suddenly had a power outage, when the power was restored, I restart my mongoDB successed, but when I cmd "show dbs;", I remeber it said like this:
WiredTiger (0) [1481884901:569933][7325:0x7f77131f8c20], file:collection/3-4162965380314250895.wt, session.open_cursor: read checksum error [4096B @ 32768, 239944241 != 3341820703]
So I hoped "mongod --repair" can help me to fix it, I use the repair cmd on 3 servers........
Surprisingly, it failed, the screen msg like this:
Fatal assertion 18506 DuplicateKey E11000 duplicate key error dup key:

{ObjectId('xxxxxxxx')}

How can I fix it? Please help me.



 Comments   
Comment by Kelsey Schubert [ 15/Sep/17 ]

Hi davy,

I've completed our investigation. Unfortunately, from the data provided, we have not be able to conclusively determine the root cause of this issue. That said, from the evidence I've examined, I suspect that this issue originated in the storage layer beneath MongoDB/WiredTiger.

Kind regards,
Kelsey

Comment by Kelsey Schubert [ 24/May/17 ]

Hi davy,

Sorry for letting this reply slip through the cracks. Unfortunately, a preliminary analysis did not reveal the root cause of this issue. We intend to do a thorough manual inspection of the bits in these files, however, this takes a significant amount of time and has to be scheduled against other work.

If you encounter this issue again, please let us know so we can reprioritize this investigation.

Kind regards,
Thomas

Comment by davy [ 22/Dec/16 ]

Hi Thomas, Is there too much difficulty? please tell me。
I have a full dump at 2016-11-25, If it is difficulty, Could I restore this dump at a new dbpath at the same time? Some work is waiting to use mongoDB.

Comment by Kelsey Schubert [ 19/Dec/16 ]

Thanks for the upload. We have the files included in mongolog.tar, and are investigating.

Comment by davy [ 19/Dec/16 ]

Sorry Thomas, I didnot reply to you Immediately.
I create a tarball include some files, filename like "dbdir" is the out of ls -al of the database dir, but there is no WiredTigerLog.<long number>, Is that because of my start mongo cmd ? startmongocmd also in the tarball.

I click your portal, but I am not sure it successed...

for you question:
all of the nodes in my replica set was affected. (3 servers)
we just use mongodb3.0.0, no early version, and never update. uodate need Approval, so troublesome......

dbdir nearly 300GB, upload it Not realistic

Comment by Kelsey Schubert [ 16/Dec/16 ]

Would you please also clarify which versions, if any, of MongoDB you were running prior to MongoDB 3.0.0 in this replica set?

Thanks,
Thomas

Comment by Kelsey Schubert [ 16/Dec/16 ]

Hi davy,

Thanks for reporting this issue. Would you please clarify whether this issue is affecting a single node or all of the nodes in your replica set?

If this issue is only affecting a single node, I would recommend creating a backup of your current $dbpath, and then performing an initial sync.

If this issue is affects all of the nodes, please be aware that complete data recovery may not be possible. Please upgrade to 3.0.14, which includes many bug fixes and improvements. After upgrading please rerun the --repair operation, provide the output. Additional diagnostic information should be logged (SERVER-17532), which will help our investigation.

In either case, we would like to examine some of your data files to get a better understanding of what type of corruption occurred during the power failure. So we can begin our investigation, would you please upload the following?

  1. the complete logs of the affected mongod
  2. tarball of the WiredTiger files (_mdb_catalog.wt, sizeStorer.wt, WiredTiger* files)
  3. collection-3-4162965380314250895.wt
  4. output of ls -l of the database directory
  5. WiredTigerLog.<long number>

I've created a secure upload portal for you to use here. Files uploaded to this portal are only visible to MongoDB employees and are routinely deleted after some time.

Please note that as our investigation continues, we may need to take a look at additional files. If you are able, please backup the current $dbpath in a safe location, before proceeding with additional recovery steps.

Thank you for your help,
Thomas

Generated at Thu Feb 08 04:15:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.