Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.0.1
Component/s: Sharding
Labels:
None
Environment:
Windows server 2008 64bit, sharded mongoD (3 shards, each are replicasets with 2 servers, so 3 replsets with 2 servers each), 1 mongoS on seperate server, C# driver

Operating System:
Windows
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In the scenario where the primary service of a one shard in a sharded collection goes down, we are getting some document losses in safe mode (even with fsync=true) on a record by record based insert (no batches)

We have built in some failover code, where we keep retrying the insert untill the safemode no longer throws an exception. However, even with this setup, we still see some document loss.

These losses occur on 2 moments (we ran some tests trying to determine the cause):
1) the moment the primary goes down and a secondary needs to take over
2) the moment the primary goes back online, and is voted for primary again in its replset (when looking on the replset stats, there is a moment when both servers are marked as secundary)

On a recordset of 50.000 records, we get somewhere between 5-10 document losses.

Enabling the option to wait for a replication write in the safe mode is hard to use in our case, since that would mean it would enter an endless loop to retry to insert the document, unless we expand our failover code to catch for this case as well. However, we think this should be handled on the database itself, rather then in code...

Here's how we're inserting right now (code without fsync option):

var safe = new SafeMode(true);
var opts = new MongoInsertOptions(tdCollection);
opts.SafeMode = safe;

for (int i = 0; i < 50000; i++)
{
try
{

var td = new TestClass();
td.Number = i;
td.NumberAsString = i.ToString();
td.Number2 = i * 2;
bool ok = false;
while (!ok)
{
try

{ var result = tdCollection.Insert(td, opts); ok = result.Ok; }

catch (Exception ex)

{ Console.WriteLine(ex); ok = false; }

}
Console.WriteLine;
}
catch (Exception ex)

{ Console.WriteLine(ex.Message); }

}
Console.WriteLine("Done writing 500000 records");

Is there something we overlooked? Or is this a bug?

Thanks in advance...

Assignee:: Spencer Brody (Inactive)
Reporter:: wouter alleweireldt
Participants:: Eliot Horowitz, Spencer Brody, wouter alleweireldt
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Oct 27 2011 08:48:35 AM UTC
Updated:: Jul 11 2016 06:35:08 PM UTC
Resolved:: Dec 03 2011 12:33:48 AM UTC

Details

Description

Attachments

Activity

People

Dates