[SERVER-39153] Server not accessible after the error _flushRoutingTableCacheUpdates { forceRoutingTableRefresh: "config.system.sessions" Created: 23/Jan/19  Updated: 30/Aug/19  Resolved: 30/Aug/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Valerie FAUTRA Assignee: Eric Sedor
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

linux : el7.x86_64


Attachments: File install_diagnostic_data.tar    
Issue Links:
Duplicate
duplicates SERVER-43108 Lost wakeup in ShardServerCatalogCach... Closed
is duplicated by SERVER-39155 Server not accessible after the error... Closed
Participants:

 Description   

hi,
My server hosting the most active shard is inaccessible for at least 2 hours with the following error

2019-01-23T05:19:58.749-0400 I COMMAND [conn5355] command admin.$cmd command: _flushRoutingTableCacheUpdates \{ forceRoutingTableRefresh: "config.system.sessions", maxTimeMS: 30000, $clusterTime: { clusterTime: Timestamp(1548235167, 543), signature: { hash: BinData(0, 14131F6D277868A4BEE8EC438CAA6DEF67C1CB5D), keyId: 6607960961905065998 } }, $configServerState: \{ opTime: { ts: Timestamp(1548235148, 81), t: 32 } }, $db: "admin" } numYields:0 ok:0 errMsg:"operation exceeded time limit" errName:MaxTimeMSExpired errCode:50 reslen:396 locks:\{ Global: { acquireCount: { r: 1 } } } protocol:op_msg 30499ms



 Comments   
Comment by Kelsey Schubert [ 30/Aug/19 ]

Hi valerie.fautra@orange.com,

Through internal testing, we were able to identify SERVER-43108, which we believe is the same issue as has been reported here. For updates, please watch SERVER-43108.

Kind regards,
Kelsey

Comment by Eric Sedor [ 20/Mar/19 ]

Hi,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Eric

Comment by Eric Sedor [ 08/Mar/19 ]

Hi,

We still need additional information to diagnose the problem. If this is still an issue for you, can you see if any of the above information is available?

Thanks,
Eric

Comment by Eric Sedor [ 29/Jan/19 ]

Hi valerie.fautra@orange.com,

Sorry for the added step here.

We've been reviewing this ticket again and did notice there may have been an issue related to capped collections. If you're still interested in pursuing this issue, are you able to provide the logs for the affected node during the 2-hour outage?

Can you also tell us about how many capped collections you are using, what size they are, and what rate of writes they receive?

The diagnostic.data directory and the logs from secondary nodes may also be helpful.

I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. 

Comment by Eric Sedor [ 24/Jan/19 ]

Thank you. This appears to be a performance issue that would require investigation. The log line above is likely a symptom.

Because the SERVER project is for reporting bugs or feature suggestions for the MongoDB server, we'd like to ask that you post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

Comment by Valerie FAUTRA [ 24/Jan/19 ]

HI Eric
I sent you the file required.

Comment by Eric Sedor [ 23/Jan/19 ]

It is possible for many operations to exceed a maxTimeMS setting when a node is experiencing trouble, but it is not clear from this log message what trouble actually occurred, or if that trouble is the result of a bug.

Would you please archive (tar or zip) the $dbpath/diagnostic.data directory from the affected node and attach it to this ticket?

Comment by Valerie FAUTRA [ 23/Jan/19 ]

Thanks
First time i use this platform

Envoyé depuis mon smartphone Samsung Galaxy.

-------- Message d'origine --------
De : "Sam Rossi (Jira)" <jira@mongodb.org>
Date : 23/01/2019 14:49 (GMT-04:00)
À : FAUTRA PRADEL Valérie OF/OCA <valerie.fautra@orange.com>
Objet : [MongoDB-JIRA] (SERVER-39153) Server not accessible after the error _flushRoutingTableCacheUpdates { forceRoutingTableRefresh: "config.system.sessions"

[ https://jira.mongodb.org/browse/SERVER-39153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sam Rossi a déplacé MONGOID-4693 vers SERVER-39153:
---------------------------------------------

Projet: Core Server (a été: Mongoid)
Clé: SERVER-39153 (a été: MONGOID-4693)
Flux de travaux: Primary SERVER Workflow (a été: Drivers Only Workflow)
Composants: (a été: Mongoid)
Affecte la/les version(s): (a été: 4.0.0 final)

----------------------
This message was sent from MongoDB's issue tracking system. To respond to this ticket, please login to https://jira.mongodb.org using your JIRA or MMS credentials.

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

Generated at Thu Feb 08 04:51:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.