[SERVER-21979] MongoD instance came down Created: 21/Dec/15  Updated: 15/Jan/16  Resolved: 11/Jan/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: James Mangold Assignee: Siyuan Zhou
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive dev-004-mongo_db.log.zip     File dev-008-mongo_db.log.2015-12-21T19-16-42     File dev-012-mongo_db.log.2015-12-22T19-49-47     File mongo_db.log.2015-12-21T19-16-42    
Issue Links:
Duplicate
duplicates SERVER-21988 Rollback does not wait for applier to... Closed
Related
is related to SERVER-22136 Attach term metadata to UpdatePositio... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl F (01/29/16)
Participants:

 Description   

Hi. I have a 12 node sharded and replicated cluster on the GA version of 3.2. This morning a node came down and I don't know why. I am attaching the log.



 Comments   
Comment by James Mangold [ 14/Jan/16 ]

We are currently running 3.2.1. Thank you. Will you please update this when 3.2.2 is available?

Comment by Ramon Fernandez Marina [ 13/Jan/16 ]

james.mangold@interactivedata.com, as Siyuan mentions above the fix for this issue will be part of the upcoming 3.2.2 release, which is scheduled for publication in about 4-6 weeks at the moment.

I'd strongly encourage you to upgrade to 3.2.1, which not only contains other important fixed but should also work around the fassert() issue.

Regards,
Ramón.

Comment by Siyuan Zhou [ 11/Jan/16 ]

Resolved as dup to SERVER-21988 which has been fixed last week.

Comment by Scott Hernandez (Inactive) [ 09/Jan/16 ]

Yes, 3.2.1 will have the fix for this problem in SERVER-21988.

Comment by Siyuan Zhou [ 08/Jan/16 ]

Talked with scotthernandez, we believe this issue is fixed by SERVER-21988.

2015-12-21T11:00:52.773-0500 I ASIO     [NetworkInterfaceASIO-BGSync-0] Successfully connected to sec-dev-mongo012:27020
2015-12-21T11:00:52.790-0500 I REPL     [rsBackgroundSync] starting rollback: OplogStartMissing our last op time fetched: (term: 1, timestamp: Dec 21 11:00:27:233). source's GTE: (term: 2, timestamp: Dec 21 11:00:50:1) hashes: (7170794878583616227/7434769270327729983)
2015-12-21T11:00:54.643-0500 F REPL     [rsBackgroundSync] need to rollback, but in inconsistent state
2015-12-21T11:00:54.643-0500 I -        [rsBackgroundSync] Fatal assertion 28723 UnrecoverableRollbackError need to rollback, but in inconsistent state. minvalid: (term: 1, timestamp: Dec 21 11:00:02:1298) > our last optime: (term: 1, timestamp: Dec 21 11:00:01:44d3) @ 18750

We'll discuss the implication wrt RAFT offline. scotthernandez, do you think we can close this ticket as a dup?

Comment by James Mangold [ 05/Jan/16 ]

Sorry - I meant 008...

Comment by James Mangold [ 05/Jan/16 ]

004 went down in this case...

Comment by James Mangold [ 05/Jan/16 ]

Hi - just getting back to this, because it happened again: I am attaching the log from the downed node and it's sync sources. Sorry - was out for xmas.

Comment by Kelsey Schubert [ 21/Dec/15 ]

Hi james.mangold@interactivedata.com,

To continue to investigate can you please answer the following questions:

  1. Can you please provide the logs from the node's sync source sec-dev-mongo012:27020 and sec-dev-mongo004:27020?
  2. In SERVER-21980, you mention restarting a node and removing its data. Was that node part of this cluster? Have you made any recent changes to to this replicaset?

Thank you,
Thomas

Generated at Thu Feb 08 03:59:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.