[SERVER-17487] cloner dropDups removes _id entries belonging to other records Created: 06/Mar/15  Updated: 19/Sep/15  Resolved: 09/Mar/15

Status: Closed
Project: Core Server
Component/s: Replication, WiredTiger
Affects Version/s: 3.0.0
Fix Version/s: 3.0.1, 3.1.0

Type: Bug Priority: Critical - P2
Reporter: Rui Zhang (Inactive) Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: ET
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File SERVER-17487.patch     File cloner_dups.js    
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:

-replica set
-insert 2M doc
-run mix traffic (insert/deletion, update) and re-sync one member

  • all wiredTiger as storageEngine
  • multiple member replica set
Participants:

 Description   

after re-sync is done, the _id index count, secondary index count and data table count could be different.

        "index" : 2000000,
        "secondary_index" : 2047102,
        "table" : 2047102

and log file show (with Dan's patch for more printout)

2015-03-06T01:05:28.851+0000 I REPL     [rsSync] ******
2015-03-06T01:05:28.851+0000 I REPL     [rsSync] creating replication oplog of size: 4094MB...
2015-03-06T01:05:28.853+0000 I STORAGE  [rsSync] Starting WiredTigerRecordStoreThread local.oplog.rs
2015-03-06T01:05:28.870+0000 I REPL     [rsSync] ******
2015-03-06T01:05:28.870+0000 I REPL     [rsSync] initial sync pending
2015-03-06T01:05:28.870+0000 I REPL     [ReplicationExecutor] syncing from: 10.1.1.107:27017
2015-03-06T01:05:28.874+0000 I REPL     [rsSync] initial sync drop all databases
2015-03-06T01:05:28.874+0000 I STORAGE  [rsSync] dropAllDatabasesExceptLocal 1
2015-03-06T01:05:28.874+0000 I REPL     [rsSync] initial sync clone all databases
2015-03-06T01:05:28.875+0000 I REPL     [rsSync] initial sync cloning db: sbtest
2015-03-06T01:05:42.605+0000 I INDEX    [rsSync] build index on: sbtest.sbtest1 properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "sbtest.sbtest1" }
2015-03-06T01:05:42.605+0000 I INDEX    [rsSync]         building index using bulk method
2015-03-06T01:05:47.969+0000 I INDEX    [rsSync] build index done.  scanned 2047128 total records. 5 secs
2015-03-06T01:05:47.969+0000 I STORAGE  [rsSync] Deleting: RecordId(1955618)
2015-03-06T01:05:47.969+0000 I STORAGE  [rsSync] Deleting: RecordId(1955661)
2015-03-06T01:05:47.970+0000 I STORAGE  [rsSync] Deleting: RecordId(1955710)
2015-03-06T01:05:47.970+0000 I STORAGE  [rsSync] Deleting: RecordId(1955778)
2015-03-06T01:05:47.970+0000 I STORAGE  [rsSync] Deleting: RecordId(1955802)
2015-03-06T01:05:47.970+0000 I STORAGE  [rsSync] Deleting: RecordId(1955996)
2015-03-06T01:05:47.970+0000 I STORAGE  [rsSync] Deleting: RecordId(1956089)
2015-03-06T01:05:47.970+0000 I STORAGE  [rsSync] Deleting: RecordId(1956102)
2015-03-06T01:05:47.970+0000 I STORAGE  [rsSync] Deleting: RecordId(1956127)
2015-03-06T01:05:47.970+0000 I STORAGE  [rsSync] Deleting: RecordId(1956187)
2015-03-06T01:05:47.970+0000 I STORAGE  [rsSync] Deleting: RecordId(1956305)
....
 
2015-03-06T01:05:49.275+0000 I STORAGE  [rsSync] Deleting: RecordId(2047128)
2015-03-06T01:05:49.275+0000 I STORAGE  [rsSync] index build dropped: 47128 dups
2015-03-06T01:05:49.449+0000 I REPL     [rsSync] initial sync data copy, starting syncup
2015-03-06T01:05:49.449+0000 I REPL     [rsSync] oplog sync 1 of 3

  • not sure why there is dup for _id
  • dropped dups are not removed from data table and secondary index tree
  • saw one case when mixed traffic stopped, re-sync still have issue with dups and out of sync, but cannot reproduce it now.


 Comments   
Comment by Githook User [ 10/Apr/15 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-17487 Raise awaitReplication timeout in test

(cherry picked from commit 5f64a1c4c3eb6698dfa2e721972a158202229bb0)
Branch: v3.0
https://github.com/mongodb/mongo/commit/23d882d8f6423e27e4f3a01f7829861e77ba2e0e

Comment by Githook User [ 10/Apr/15 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-17487 Throttle bg insert thread in test

(cherry picked from commit 06f4ea80a51ccea8da275e095cc5061707263957)
Branch: v3.0
https://github.com/mongodb/mongo/commit/03d7d9d46179425212ce1840c58c4325d450b950

Comment by Githook User [ 10/Apr/15 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-17487 raise oplog size in test

(cherry picked from commit 4c7eb08931a607e9879aaf088998d3bff3720e92)
Branch: v3.0
https://github.com/mongodb/mongo/commit/d00d2bc873ebeca1536f578d8a5255e4f776aad3

Comment by Githook User [ 09/Apr/15 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-17487 raise oplog size in test
Branch: master
https://github.com/mongodb/mongo/commit/4c7eb08931a607e9879aaf088998d3bff3720e92

Comment by Githook User [ 30/Mar/15 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-17487 Throttle bg insert thread in test
Branch: master
https://github.com/mongodb/mongo/commit/06f4ea80a51ccea8da275e095cc5061707263957

Comment by Githook User [ 16/Mar/15 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-17487 Raise other timeouts in initial_sync_cloner_dups.js
Branch: master
https://github.com/mongodb/mongo/commit/e89ad4970c81ce122867e6af70d09cd164052022

Comment by Githook User [ 13/Mar/15 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-17487 Raise awaitReplication timeout in test
Branch: master
https://github.com/mongodb/mongo/commit/5f64a1c4c3eb6698dfa2e721972a158202229bb0

Comment by Githook User [ 10/Mar/15 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

Message: SERVER-17487 warn in lieu of assert in initial_sync_cloner_dups.js in case of false positive

(cherry picked from commit a52d13a43e40f6ff7e486cee943fe7feda87d59e)
Branch: v3.0
https://github.com/mongodb/mongo/commit/a1da51e735e050a2a7cf4806d5e5e133a5d16d9d

Comment by Githook User [ 10/Mar/15 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

Message: SERVER-17487 warn in lieu of assert in initial_sync_cloner_dups.js in case of false positive
Branch: master
https://github.com/mongodb/mongo/commit/a52d13a43e40f6ff7e486cee943fe7feda87d59e

Comment by Githook User [ 10/Mar/15 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

Message: SERVER-17487 move jstest to noPassthrough

(cherry picked from commit aed0e430605ea07f39ed1648c056f31351fb09f2)
Branch: v3.0
https://github.com/mongodb/mongo/commit/37fd2853903bf26384a31a0d81e8d5a89546648e

Comment by Githook User [ 10/Mar/15 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

Message: SERVER-17487 move jstest to noPassthrough
Branch: master
https://github.com/mongodb/mongo/commit/aed0e430605ea07f39ed1648c056f31351fb09f2

Comment by Githook User [ 09/Mar/15 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-17487 Use strict unindex operations for in-progress indexes

(cherry picked from commit 378f8dd985cb3c8d03b384d9671abff92693c7f5)
Branch: v3.0
https://github.com/mongodb/mongo/commit/a56f29e60d872bb6334a24dd732d500b3ae52b48

Comment by Githook User [ 09/Mar/15 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-17487 Use strict unindex operations for in-progress indexes
Branch: master
https://github.com/mongodb/mongo/commit/378f8dd985cb3c8d03b384d9671abff92693c7f5

Comment by Rui Zhang (Inactive) [ 06/Mar/15 ]

this is done with sysbench, after further investigation, I found that this is due to following sequence of operations. In order to properly run remove operation without creation holes in the collection _id space, it runs

// pseudo code
coll.remove({_id: SOME_INT_IN_RANGE});
coll.insert({_id: SOME_INT_IN_RANGE, ... other fields});

tested with mmapv1, we also show remove dups, but it makes sure index table and data table in sync, WT doesn't which is the issue here.

simple repro steps

  • setup replca with WT
  • run following script from primary

    var test = function () {
      var d = db.getSiblingDB("test");
      d.getCollection("foo").drop();
      d.getCollection("foo").createIndex({k: 1});
     
      // build up oplog
      var res = benchRun( {
        ops : [{
        ns : "test.foo",
        op : "insert" ,
        doc : { k : { "#RAND_INT" : [ 0 , 100 ] } } ,
        writeCmd : true }],
        seconds : 20,
        totals : true,
        writeCmd : true,
        parallel : 20
      });
      // insert some doc
      for ( i = 0; i < 5000; i++ ) {
        d.foo.insert({_id: i});
      }
      // not do remove and insert
      print("Start remove/insert, do replica re-sync now")
      for ( i = 0; i < 500000; i++ ) {
        d.foo.remove({_id: i % 5000})
        d.foo.insert({_id: i % 5000});
      }
      return res;
    }

  • start re-sync (start a mongod with empty db files) when the above script print "Start remove/insert, do replica re-sync now", and check log, watching for

    STORAGE  [rsSync] index build dropped: 30 dups

  • paste following script into secondary mongo shell to check whether index and table are in sync

        rs.slaveOk();
        var dbname = "test"
     
        print("DB: " + dbname);
        db.getSiblingDB(dbname).getCollectionNames().forEach(function (name) {
            if (name == "oplog.rs") return;
     
            print(dbname + "." + name);
            var c = db.getSiblingDB(dbname)[name];
            var index = c.find({},{_id:1}).hint({_id:1}).itcount();
            var secondary_index = c.find({},{_id:1}).hint({k:1}).itcount();
            var table = c.find({},{_id:1}).hint({$natural:1}).itcount();
            if (index != table || index != secondary_index) {
                printjson({name:name, index:index, secondary_index: secondary_index, table:table});
            } else {
                print("   index: " + index)
            }
        });

Comment by Daniel Pasette (Inactive) [ 06/Mar/15 ]

rui.zhang, what's the workload generator you're using and how are you invoking it?

Generated at Thu Feb 08 03:44:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.