[SERVER-2276]  Lost Data while Master in Replica set went down Created: 22/Dec/10  Updated: 29/May/12  Resolved: 02/Sep/11

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: 1.6.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Krishna Maddireddy Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOs 5


Operating System: Linux
Participants:

 Description   

Lost Data while Master in Replica set went down

Lost data when i killed master while inserting data through mongos shell.

I don't see any rollback files in any of the data directories.

Configuration:

Replica Set dbs

mongod --fork --rest --port 30001 --replSet set1/localhost:30001,localhost:30002,localhost:30003 --shardsvr --oplogSize 500 --dbpath /home/mongo/data/db/s1r1 --logpath /home/mongo/logs/s1r1.log
mongod --fork --rest --port 30002 --replSet set1/localhost:30001,localhost:30002,localhost:30003 --shardsvr --oplogSize 500 --dbpath /home/mongo/data/db/s1r2 --logpath /home/mongo/logs/s1r2.log
mongod --fork --rest --port 30003 --replSet set1/localhost:30001,localhost:30002,localhost:30003 --shardsvr --oplogSize 500 --dbpath /home/mongo/data/db/s1r3 --logpath /home/mongo/logs/s1r3.log
mongod --fork --rest --port 30004 --replSet set1 --logpath /home/mongo/logs/arbiter1_set1.log --dbpath /home/mongo/data/db/s1/arbiter1 --oplogSize 500
mongod --fork --rest --port 30005 --replSet set1 --logpath /home/mongo/logs/arbiter2_set1.log --dbpath /home/mongo/data/db/s1/arbiter2 --oplogSize 500

Replica configuration:

new_config = {_id: 'set1', members: [
                         {_id: 0, host: 'localhost:30001'},
                         {_id: 1, host: 'localhost:30002'},
                         {_id: 2, host: 'localhost:30003'},
{_id: 3, host: 'localhost:30004',arbiterOnly: true},
{_id: 4, host: 'localhost:30005',arbiterOnly: true}]
          }

Configuration Server

mongod --fork --configsvr --port 20001 --dbpath /home/mongo/data/db/config1 --logpath /home/mongo/logs/config1.log

Mongos

mongos --fork --configdb localhost:20001 --chunkSize 1 --logpath /home/mongo/logs/mongos.log

> rs.status()
{
"set" : "set1",
"date" : "Wed Dec 22 2010 14:47:07 GMT-0500 (EST)",
"myState" : 1,
"members" : [

{ "_id" : 0, "name" : "db6.moon-ray.com:30001", "health" : 1, "state" : 1, "self" : true }

,

{ "_id" : 1, "name" : "localhost:30002", "health" : 1, "state" : 2, "uptime" : 30, "lastHeartbeat" : "Wed Dec 22 2010 14:47:07 GMT-0500 (EST)" }

,

{ "_id" : 2, "name" : "localhost:30003", "health" : 1, "state" : 2, "uptime" : 24, "lastHeartbeat" : "Wed Dec 22 2010 14:47:07 GMT-0500 (EST)" }

,

{ "_id" : 3, "name" : "localhost:30004", "health" : 1, "state" : 7, "uptime" : 26, "lastHeartbeat" : "Wed Dec 22 2010 14:47:07 GMT-0500 (EST)" }

,

{ "_id" : 4, "name" : "localhost:30005", "health" : 1, "state" : 7, "uptime" : 26, "lastHeartbeat" : "Wed Dec 22 2010 14:47:07 GMT-0500 (EST)" }

],
"ok" : 1
}

added sharding config through mongos

db.runCommand(

{ addshard : "set1/localhost:30001,localhost:30002,localhost:30003", name : "shard1" }

);

db.runCommand(

{listshards:1}

)
{
"shards" : [

{ "_id" : "shard1", "host" : "set1/localhost:30001,localhost:30002,localhost:30003" }

],
"ok" : 1
}

inserted some data

use demo_contacts

db.contacts.count()
300000

Killed master while inserting data.
> db.contacts.count()
300000
> for (var i = 1; i <= 100000; i++) db.contacts.save({aid:4, contact_id:i,test_string: "This is test string to test mongodb s>
> db.contacts.count()
399981
>

~



 Comments   
Comment by Eliot Horowitz (Inactive) [ 02/Sep/11 ]

Sorry for the delay.
If you're not using getLastError, then there is no way to tell which data got to the server.
For important writes, you should getLastError with w=2 so you know its on 2 servers

Comment by Krishna Maddireddy [ 22/Dec/10 ]

is the lost data stored some where in unsafe mode?

Comment by Krishna Maddireddy [ 22/Dec/10 ]

with 1.7.5, when a existing master goes down , the mongos can not access the cluster even though there was a active primary.

> db.printShardingStatus()
Wed Dec 22 15:53:48 uncaught exception: error

{ "$err" : "mongos connectionpool: connect failed set1/localhost:30001,localhost:30002,localhost:30003 : connect failed to set set1/localhost:30001,localhost:30002,localhost:30003", "code" : 11002 } > mongo@db6 [~]# mongo --port 30002 MongoDB shell version: 1.7.4 connecting to: 127.0.0.1:30002/test set1:PRIMARY> db.isMaster() { "setName" : "set1", "ismaster" : true, "secondary" : false, "hosts" : [ "localhost:30002", "localhost:30003", "localhost:30001" ], "arbiters" : [ "localhost:30005", "localhost:30004" ], "maxBsonObjectSize" : 16777216, "ok" : 1 } set1:PRIMARY> > use demo_contacts switched to db demo_contacts <ng: "This is test string to test mongodb sharding "+i,email:"test@yahoo.com",testcase:"Test Case1"}

);
>
<ng: "This is test string to test mongodb sharding "+i,email:"test@yahoo.com",testcase:"Test Case1"});
>
<ng: "This is test string to test mongodb sharding "+i,email:"test@yahoo.com",testcase:"Test Case1"});
>
<tacts.save(

{aid:4, contact_id:i,test_string: "This is test string to test mongodb sharding "+i,email:"test@yahoo.com"}

);
mongos connectionpool: connect failed set1/localhost:30001,localhost:30002,localhost:30003 : connect failed to set set1/localhost:30001,localhost:30002,localhost:30003
> db.contacts.count()
Wed Dec 22 15:51:40 uncaught exception: count failed: {
"assertion" : "mongos connectionpool: connect failed set1/localhost:30001,localhost:30002,localhost:30003 : connect failed to set set1/localhost:30001,localhost:30002,localhost:30003",
"assertionCode" : 11002,
"errmsg" : "db assertion failure",
"ok" : 0
}
>

Comment by Eliot Horowitz (Inactive) [ 22/Dec/10 ]

Can you try with 1.7.5?

Note that unless you call getLastError after every insert (or use safe mode in the drivers), on hardware failure a write may get lost.

Generated at Thu Feb 08 02:59:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.