[SERVER-28366] mongod --repair fails on a secondary when --replSet is passed. Created: 17/Mar/17  Updated: 31/May/17  Resolved: 23/Mar/17

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: 3.2.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Filip Havlí?ek Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

GNU/Linux Debian Wheezy x86_64
Kernel 4.4.2
xfs file system


Attachments: Text File logfile.txt     Text File steps_to_reproduce.txt    
Operating System: ALL
Steps To Reproduce:
  1. create replica set with auth enabled

    mkdir -p /data/repairtest/primary /data/repairtest/secondary_mmapv1 /data/repairtest/secondary_wiredTiger
    openssl rand -base64 741 >/data/repairtest/keyfile
    mongod --port 27066 --storageEngine mmapv1 --dbpath /data/repairtest/primary --replSet repairtest --smallfiles --noprealloc --oplogSize=100 --auth --keyFile /data/repairtest/keyfile
    mongod --port 27067 --storageEngine mmapv1 --dbpath /data/repairtest/secondary_mmapv1 --replSet repairtest --smallfiles --noprealloc --oplogSize=100 --auth --keyFile /data/repairtest/keyfile
    mongod --port 27068 --storageEngine wiredTiger --dbpath /data/repairtest/secondary_wiredTiger --replSet repairtest --oplogSize=100 --auth --keyFile /data/repairtest/keyfile
    echo 'rs.initiate({_id:"repairtest",version:1,members:[{ _id:0,host:"localhost:27066"}]})' | mongo --port=27066 --host localhost
    echo 'db.createUser( { user: "root", pwd: "genericpassword", roles: [ { role: "root", db: "admin" } ] } )' | mongo admin --port 27066 --host localhost
    echo 'rs.add("localhost:27067")' | mongo admin --port 27066 --host localhost -u root -p genericpassword
    echo 'rs.add("localhost:27068")' | mongo admin --port 27066 --host localhost -u root -p genericpassword
    

  2. shutdown mmapv1 secondary and restart it with --repair

    mongod --port 27067 --storageEngine mmapv1 --dbpath /data/repairtest/secondary_mmapv1 --replSet repairtest --smallfiles --noprealloc --oplogSize=100 --auth --keyFile /data/repairtest/keyfile --repair
    

Participants:

 Description   

We use repairDatabase (https://docs.mongodb.com/v3.2/reference/command/repairDatabase/) as the way to reclaim disk space. After upgrade to 3.2.12 (from 3.0.8) it doesn't work with mmapv1 and repair process ends with fatal assertion. With wiredTiger storage engine repair works.

2017-03-17T12:36:57.822+0100 I CONTROL  [initandlisten] MongoDB starting : pid=16501 port=27067 dbpath=/data/repairtest/secondary_mmapv1 64-bit host=somegenerichost
2017-03-17T12:36:57.822+0100 I CONTROL  [initandlisten] db version v3.2.12
2017-03-17T12:36:57.822+0100 I CONTROL  [initandlisten] git version: ef3e1bc78e997f0d9f22f45aeb1d8e3b6ac14a14
2017-03-17T12:36:57.822+0100 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1e 11 Feb 2013
2017-03-17T12:36:57.822+0100 I CONTROL  [initandlisten] allocator: tcmalloc
2017-03-17T12:36:57.822+0100 I CONTROL  [initandlisten] modules: none
2017-03-17T12:36:57.822+0100 I CONTROL  [initandlisten] build environment:
2017-03-17T12:36:57.822+0100 I CONTROL  [initandlisten]     distmod: debian71
2017-03-17T12:36:57.822+0100 I CONTROL  [initandlisten]     distarch: x86_64
2017-03-17T12:36:57.822+0100 I CONTROL  [initandlisten]     target_arch: x86_64
2017-03-17T12:36:57.822+0100 I CONTROL  [initandlisten] options: { net: { port: 27067 }, repair: true, replication: { oplogSizeMB: 100, replSet: "repairtest" }, security: { authorization: "enabled", keyFile: "/data/repairtest/keyfile" }, storage: { dbPath: "/data/repairtest/secondary_mmapv1", engine: "mmapv1", mmapv1: { preallocDataFiles: false, smallFiles: true } } }
2017-03-17T12:36:57.892+0100 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2017-03-17T12:36:57.892+0100 I CONTROL  [initandlisten]
2017-03-17T12:36:57.893+0100 I STORAGE  [initandlisten] repairDatabase local
2017-03-17T12:36:57.896+0100 I INDEX    [initandlisten] allocating new ns file /data/repairtest/secondary_mmapv1/_tmp_repairDatabase_0/local.ns, filling with zeroes...
2017-03-17T12:36:57.926+0100 I STORAGE  [FileAllocator] allocating new datafile /data/repairtest/secondary_mmapv1/_tmp_repairDatabase_0/local.0, filling with zeroes...
2017-03-17T12:36:57.926+0100 I STORAGE  [FileAllocator] creating directory /data/repairtest/secondary_mmapv1/_tmp_repairDatabase_0/_tmp
2017-03-17T12:36:57.927+0100 I STORAGE  [FileAllocator] done allocating datafile /data/repairtest/secondary_mmapv1/_tmp_repairDatabase_0/local.0, size: 16MB,  took 0 secs
2017-03-17T12:36:57.929+0100 I INDEX    [initandlisten] build index on: local.me properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.me" }
2017-03-17T12:36:57.929+0100 I INDEX    [initandlisten] 	 building index using bulk method
2017-03-17T12:36:57.930+0100 I STORAGE  [FileAllocator] allocating new datafile /data/repairtest/secondary_mmapv1/_tmp_repairDatabase_0/local.1, filling with zeroes...
2017-03-17T12:36:57.930+0100 I STORAGE  [FileAllocator] done allocating datafile /data/repairtest/secondary_mmapv1/_tmp_repairDatabase_0/local.1, size: 128MB,  took 0 secs
2017-03-17T12:36:57.931+0100 I INDEX    [initandlisten] build index on: local.replset.minvalid properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.replset.minvalid" }
2017-03-17T12:36:57.931+0100 I INDEX    [initandlisten] 	 building index using bulk method
2017-03-17T12:36:57.931+0100 I INDEX    [initandlisten] build index on: local.startup_log properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.startup_log" }
2017-03-17T12:36:57.931+0100 I INDEX    [initandlisten] 	 building index using bulk method
2017-03-17T12:36:57.932+0100 I INDEX    [initandlisten] build index on: local.system.replset properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.system.replset" }
2017-03-17T12:36:57.932+0100 I INDEX    [initandlisten] 	 building index using bulk method
2017-03-17T12:36:57.943+0100 I STORAGE  [initandlisten] repairDatabase admin
2017-03-17T12:36:57.945+0100 I INDEX    [initandlisten] allocating new ns file /data/repairtest/secondary_mmapv1/_tmp_repairDatabase_0/admin.ns, filling with zeroes...
2017-03-17T12:36:57.974+0100 I STORAGE  [FileAllocator] allocating new datafile /data/repairtest/secondary_mmapv1/_tmp_repairDatabase_0/admin.0, filling with zeroes...
2017-03-17T12:36:57.974+0100 I STORAGE  [FileAllocator] creating directory /data/repairtest/secondary_mmapv1/_tmp_repairDatabase_0/_tmp
2017-03-17T12:36:57.975+0100 I STORAGE  [FileAllocator] done allocating datafile /data/repairtest/secondary_mmapv1/_tmp_repairDatabase_0/admin.0, size: 16MB,  took 0 secs
2017-03-17T12:36:57.977+0100 F REPL     [initandlisten] logOp() but can't accept write to collection admin.$cmd
2017-03-17T12:36:57.977+0100 I -        [initandlisten] Fatal Assertion 17405
2017-03-17T12:36:57.977+0100 I -        [initandlisten]
 
***aborting after fassert() failure
 
 
2017-03-17T12:36:57.996+0100 F -        [initandlisten] Got signal: 6 (Aborted).
 
 0x1315b12 0x1314a39 0x1315242 0x7fb3cf3070a0 0x7fb3cef9d125 0x7fb3cefa03a0 0x12981d2 0xe81dcc 0xe837b3 0xe83972 0xcd341b 0xa9cfd7 0x10491ac 0xe23cd7 0x9462e3 0x9481b0 0x949a3d 0x7fb3cef89ead 0x942549
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F15B12","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F14A39"},{"b":"400000","o":"F15242"},{"b":"7FB3CF2F8000","o":"F0A0"},{"b":"7FB3CEF6B000","o":"32125","s":"gsignal"},{"b":"7FB3CEF6B000","o":"353A0","s":"abort"},{"b":"400000","o":"E981D2","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"A81DCC"},{"b":"400000","o":"A837B3","s":"_ZN5mongo4repl6_logOpEPNS_16OperationContextEPKcS4_RKNS_7BSONObjEPS5_bRKSsNS0_22ReplicationCoordinator4ModeEb"},{"b":"400000","o":"A83972","s":"_ZN5mongo4repl5logOpEPNS_16OperationContextEPKcS4_RKNS_7BSONObjEPS5_b"},{"b":"400000","o":"8D341B","s":"_ZN5mongo10OpObserver18onCreateCollectionEPNS_16OperationContextERKNS_15NamespaceStringERKNS_17CollectionOptionsE"},{"b":"400000","o":"69CFD7","s":"_ZN5mongo8Database16createCollectionEPNS_16OperationContextENS_10StringDataERKNS_17CollectionOptionsEb"},{"b":"400000","o":"C491AC","s":"_ZN5mongo12MMAPV1Engine14repairDatabaseEPNS_16OperationContextERKSsbb"},{"b":"400000","o":"A23CD7","s":"_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKSsbb"},{"b":"400000","o":"5462E3"},{"b":"400000","o":"5481B0","s":"_ZN5mongo13initAndListenEi"},{"b":"400000","o":"549A3D","s":"main"},{"b":"7FB3CEF6B000","o":"1EEAD","s":"__libc_start_main"},{"b":"400000","o":"542549"}],"processInfo":{ "mongodbVersion" : "3.2.12", "gitVersion" : "ef3e1bc78e997f0d9f22f45aeb1d8e3b6ac14a14", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.2-lscs-2", "version" : "#1 SMP Fri Feb 19 11:08:38 CET 2016", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "A9B18E620F02487C1F2B5F355EB119AC75B94CD7" }, { "b" : "7FFF62A70000", "elfType" : 3, "buildId" : "08F635781C80161ACACC1542E77EF77D0A04D08D" }, { "b" : "7FB3D01B1000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "16DA62C831D8080E3BCB9A1E7738D399EA3A89C0" }, { "b" : "7FB3CFDB8000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "079D784F9F4A88B6D87FC3E6B0B07BFBE209540E" }, { "b" : "7FB3CFBB0000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "77CB0070BB75470C626B899800FFCF8C8184E32A" }, { "b" : "7FB3CF9AC000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "32CCD52B7228B03E969C21587F14C2322D4A9544" }, { "b" : "7FB3CF72A000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "98FAFAB6E776EA40FBFFFBCB22859A8D54028EC0" }, { "b" : "7FB3CF514000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "F980B1188708F8D8B5C35D185444AF4CB939AA1E" }, { "b" : "7FB3CF2F8000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "F260CD217A3C0D89FD80BCC80D486C1C2AD32DCA" }, { "b" : "7FB3CEF6B000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "BD7458ADC60E906DBAABB72A7DA5015D5BC8269F" }, { "b" : "7FB3D0410000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "B66F918D37C1E4122FC91C74210F2112A6A5148E" }, { "b" : "7FB3CED54000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "1EFEB71FD4999C2307570D673A724EA4E1D85267" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1315b12]
 mongod(+0xF14A39) [0x1314a39]
 mongod(+0xF15242) [0x1315242]
 libpthread.so.0(+0xF0A0) [0x7fb3cf3070a0]
 libc.so.6(gsignal+0x35) [0x7fb3cef9d125]
 libc.so.6(abort+0x180) [0x7fb3cefa03a0]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x12981d2]
 mongod(+0xA81DCC) [0xe81dcc]
 mongod(_ZN5mongo4repl6_logOpEPNS_16OperationContextEPKcS4_RKNS_7BSONObjEPS5_bRKSsNS0_22ReplicationCoordinator4ModeEb+0x213) [0xe837b3]
 mongod(_ZN5mongo4repl5logOpEPNS_16OperationContextEPKcS4_RKNS_7BSONObjEPS5_b+0x62) [0xe83972]
 mongod(_ZN5mongo10OpObserver18onCreateCollectionEPNS_16OperationContextERKNS_15NamespaceStringERKNS_17CollectionOptionsE+0x31B) [0xcd341b]
 mongod(_ZN5mongo8Database16createCollectionEPNS_16OperationContextENS_10StringDataERKNS_17CollectionOptionsEb+0x2F7) [0xa9cfd7]
 mongod(_ZN5mongo12MMAPV1Engine14repairDatabaseEPNS_16OperationContextERKSsbb+0xF6C) [0x10491ac]
 mongod(_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKSsbb+0xF77) [0xe23cd7]
 mongod(+0x5462E3) [0x9462e3]
 mongod(_ZN5mongo13initAndListenEi+0x10) [0x9481b0]
 mongod(main+0x15D) [0x949a3d]
 libc.so.6(__libc_start_main+0xFD) [0x7fb3cef89ead]
 mongod(+0x542549) [0x942549]
-----  END BACKTRACE  -----
Aborted



 Comments   
Comment by Filip Havlí?ek [ 23/Mar/17 ]

Hi,

Thank you. You're right, I forgot this. Without --replSet it works.
Strange thing is that in 3.0.8 it worked with --replSet option. And with wiredTiger it works in 3.2.12 too. So there is some inconsistency between documentation and reality.

Thank you. You solved my problem. I will modify repair (reclaim disk space) process.

Comment by Mark Agarunov [ 22/Mar/17 ]

Hello filip.havlicek@livesport.eu,

Thank you for the report. Looking over the output you've provided, I am able to reproduce the behavior you've described. However, may be expected behavior. According to the documentation:

Reference > Database Commands > Administration Commands > repairDatabase

Behavior:
...
NOTE
mongod --repair will fail if your database is not a master or primary. In most cases, you should recover a corrupt secondary using the data from an existing intact node. To run repair on a secondary/slave restart the instance in standalone mode without the --replSet or --slave options.

Please try running this without the --replSet parameter and let us know if this works as expected.

Thanks,
Mark

Comment by Filip Havlí?ek [ 17/Mar/17 ]

I'm sorry for mistakes, version of mongodb is 3.2.12. Because of bad readability I will attach separate files with logs and steps to reproduce.

Generated at Thu Feb 08 04:17:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.