[SERVER-1614] [replica sets] issues after running repair Created: 11/Aug/10  Updated: 12/Jul/16  Resolved: 11/Aug/10

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.6.0
Fix Version/s: 1.6.1, 1.7.0

Type: Bug Priority: Major - P3
Reporter: Kyle Banker Assignee: Dwight Merriman
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

To reproduce:

1. Set up a replica set with three members. Initialize it.
2. Add some data so that initial sync happens.
3. Shut down each member and run a repair with mongod --repair
4. Restart the set as usual.

Each set displays this:
Wed Aug 11 13:25:41 [startReplSets] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Wed Aug 11 13:25:51 [startReplSets] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Wed Aug 11 13:26:01 [startReplSets] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)

In addition, if you look at the original master, collections appear to be missing from the local database:
> show collections
oplog.rs
slaves
system.indexes

This case is also confirmed by the two reports mentioned in this forum post:
http://groups.google.com/group/mongodb-user/browse_thread/thread/56e6f187800a8aef/92a6502ea6be3455#92a6502ea6be3455



 Comments   
Comment by auto [ 16/Aug/10 ]

Author:

{'login': 'dwight', 'name': 'dwight', 'email': 'dwight@10gen.com'}

Message: SERVER-1614
http://github.com/mongodb/mongo/commit/415c2641235ea4f8770c5377687e650749d66560

Comment by auto [ 11/Aug/10 ]

Author:

{'login': 'dwight', 'name': 'dwight', 'email': 'dwight@10gen.com'}

Message: SERVER-1614
http://github.com/mongodb/mongo/commit/69e92f803d6132ef1f00c320880f743d27dc1f2d

Comment by Dwight Merriman [ 11/Aug/10 ]

WORKAROUND / to fix this

Cause : local.system.replset not recreated on a repair.

Workaround : manually repopulate it with the single config document.

Recommend you backup first.

Basically what we want to do is (from the shell) :

> use local
> // verify it is empty. it SHOULD be if things are broken:
> db.system.replset.find()
> // then manually recreate it:
> db.system.replset.insert( YOUR_CONFIG_OBJECT );

however, the above won't work as you can't normally insert directly into a system collection. there is a hacky workaround though using rename:

> use local
> // verify it is empty. it SHOULD be if things are broken:
> db.system.replset.find()
> // then manually recreate it:
> db.temp1.insert( YOUR_CONFIG_OBJECT );
> assert( db.temp1.count() == 1 );
> db.temp1.renameCollection('system.replset');
> db.system.replset.find(); // verify

Pretty sure it will work after that. You probably only have to do a single member and it will all come back then (I think). Let me know.

Comment by Kyle Banker [ 11/Aug/10 ]

Here's some more incredibly unusual behavior.

After repairing, I connect to the original master node. Since there's no primary, I set slave ok:

> db.getMongo().setSlaveOk();
> use admin
> show collections

After running those commands, the db segfaults:

Wed Aug 11 13:48:57 [initandlisten] connection accepted from 127.0.0.1:53058 #3
Wed Aug 11 13:49:02 [conn3] assertion 10107 not master ns:admin.system.namespaces query:{}
Wed Aug 11 13:49:06 [startReplSets] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Wed Aug 11 13:49:15 Got signal: 11 (Segmentation fault).

Wed Aug 11 13:49:15 Backtrace:
0x10039804c 0x7fff8145035a 0x100b04800 0x10025f360 0x100266145 0x10039b422 0x1003acf94 0x7fff81429456 0x7fff81429309
0 mongod 0x000000010039804c _ZN5mongo10abruptQuitEi + 332
1 libSystem.B.dylib 0x00007fff8145035a _sigtramp + 26
2 ??? 0x0000000100b04800 0x0 + 4306520064
3 mongod 0x000000010025f360 _ZN5mongo13receivedQueryERNS_6ClientERNS_10DbResponseERNS_7MessageE + 608
4 mongod 0x0000000100266145 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5029
5 mongod 0x000000010039b422 _ZN5mongo10connThreadEPNS_13MessagingPortE + 562
6 mongod 0x00000001003acf94 thread_proxy + 132
7 libSystem.B.dylib 0x00007fff81429456 _pthread_start + 331
8 libSystem.B.dylib 0x00007fff81429309 thread_start + 13

Wed Aug 11 13:49:15 dbexit:

Generated at Thu Feb 08 02:57:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.