Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4159

Dataloss on sharded environment when one server in a replicaset goes down (ungracefully shuts down)

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • 2.0.1
    • Sharding
    • None
    • Windows server 2008 64bit, sharded mongoD (3 shards, each are replicasets with 2 servers, so 3 replsets with 2 servers each), 1 mongoS on seperate server, C# driver
    • Windows

    Description

      In the scenario where the primary service of a one shard in a sharded collection goes down, we are getting some document losses in safe mode (even with fsync=true) on a record by record based insert (no batches)

      We have built in some failover code, where we keep retrying the insert untill the safemode no longer throws an exception. However, even with this setup, we still see some document loss.

      These losses occur on 2 moments (we ran some tests trying to determine the cause):
      1) the moment the primary goes down and a secondary needs to take over
      2) the moment the primary goes back online, and is voted for primary again in its replset (when looking on the replset stats, there is a moment when both servers are marked as secundary)

      On a recordset of 50.000 records, we get somewhere between 5-10 document losses.

      Enabling the option to wait for a replication write in the safe mode is hard to use in our case, since that would mean it would enter an endless loop to retry to insert the document, unless we expand our failover code to catch for this case as well. However, we think this should be handled on the database itself, rather then in code...

      Here's how we're inserting right now (code without fsync option):

      var safe = new SafeMode(true);
      var opts = new MongoInsertOptions(tdCollection);
      opts.SafeMode = safe;

      for (int i = 0; i < 50000; i++)
      {
      try
      {

      var td = new TestClass();
      td.Number = i;
      td.NumberAsString = i.ToString();
      td.Number2 = i * 2;
      bool ok = false;
      while (!ok)
      {
      try

      { var result = tdCollection.Insert(td, opts); ok = result.Ok; }

      catch (Exception ex)

      { Console.WriteLine(ex); ok = false; }

      }
      Console.WriteLine;
      }
      catch (Exception ex)

      { Console.WriteLine(ex.Message); }

      }
      Console.WriteLine("Done writing 500000 records");

      Is there something we overlooked? Or is this a bug?

      Thanks in advance...

      Attachments

        Activity

          People

            spencer@mongodb.com Spencer Brody (Inactive)
            wouteralleweireldt wouter alleweireldt
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: