[SERVER-10610] Some collections are dropped spontaniously Created: 23/Aug/13  Updated: 10/Dec/14  Resolved: 28/Oct/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.2.0
Fix Version/s: None

Type: Question Priority: Critical - P2
Reporter: Rediff.com India Ltd Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Server:CentOS 6.3 Client: CentOS 6.3, PHP Driver, Java Driver


Participants:

 Description   
  • We are running a MongoDB replica set with 1 Primary and 1 Secondary and 1 Arbiter.
  • We have 2 applications writing to 2 separate db's in Mongo - lets call these DB1 and DB2 for convenience.
  • DB1 has 6 collections while DB2 has 107 collections.
  • DB1 has been in production for 10 months, DB2 has been in production for 6 months.
  • Today, at around 11:49 a.m. we discovered that 5 collections in DB1 and 90 collections in DB2 were absent from the Primary. Everything was intact in the Secondary.
  • Examining mongodb log, we saw that the number of connections ramped up from ~ 450 at 11:49 a.m. to about 20,000 at 12:11 p.m. after which the mongodb instance started refusing new connections (connection limit reached)

    Fri Aug 23 12:11:54 [initandlisten] connection refused because too many open connections: 20000


  • We restarted mongod and performed a mongodump from the secondary and ran a mongorestore to the primary. After this both the databases DB1 and DB2 started accepting connections and are now working fine.
  • We cannot see any "drop" commands in the mongodb log
  • We can see 2 instances of PageFaultException during this time

    Fri Aug 23 11:49:58 [conn33572019] PageFaultException thrown
    Fri Aug 23 11:56:00 [conn33572019] PageFaultException thrown


The questions are:

  • How did collections in the primary get dropped and were still present in the secondary?
  • Is there any condition under which something like this can happen?


 Comments   
Comment by Stennie Steneker (Inactive) [ 28/Oct/13 ]

Hi,

I'm closing this issue due to inactivity. We have not had any follow-up to our initial inquiry and would need the requested logs from the primary/secondary in order to investigate further:

Do you have logs from the primary and secondary data nodes from a point in time where you know the nodes were in sync through the time of the incident you describe?

If you still have the logs available and are able to provide them, please feel free to comment and re-open the issue. If there is potentially sensitive information in the logs please advise and we can create a private issue to review those.

Thanks,
Stephen

Comment by Daniel Pasette (Inactive) [ 26/Aug/13 ]

Do you have logs from the primary and secondary data nodes from a point in time where you know the nodes were in sync through the time of the incident you describe?

Generated at Thu Feb 08 03:23:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.