Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10344

Race condition when starting up new master/slave cluster. Was: repl4.js failing on Linux 64-bit Weekly Slow Tests

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
    • ALL
    • Hide

      time scons --dd --sharedclient all
      python buildscripts/cleanbb.py
      ./buildscripts/smoke.py --mode=files --auth jstests/repl/repl4.js

      Show
      time scons --dd --sharedclient all python buildscripts/cleanbb.py ./buildscripts/smoke.py --mode=files --auth jstests/repl/repl4.js

      jstests/repl/repl4.js was failing when started under auth mode.

      I believe this is actually a timing bug in master/slave replication, not directly an auth issue. It seems to have something to do with setting up a slave that only syncs a single DB from the master.

      If you put a "sleep(5000)" after line 13 in repl4.js (the line that starts the primary), then the test passes. Also, if you switch the order of lines 19 and 20 (the lines that do writes into 2 dbs, one that's synced and one that isn't) then the test passes.

      When the test fails, this shows up in the logs of the slave:

      m31001| Fri Jul 26 15:30:01.972 [replslave] repl:   nextOpTime Jul 26 15:30:01 51f2ce39:1 > syncedTo Dec 31 19:00:00 0:0
       m31001| repl:   time diff: 1374867001sec
       m31001| repl:   tailing: 0
       m31001| repl:   data too stale, halting replication
      



      ORIGINAL DESCRIPTION:

      Linux 64-bit Weekly Slow Tests Build #256 July 14 rev f204f7f

      Linux 64-bit Weekly Slow Tests Build #261 July 21 rev 9bf7075

      Linux 64-bit Weekly Slow Tests Build #262 July 23 rev 37f7f30

      (#263 was interrupted)

      Linux 64-bit Weekly Slow Tests Build #264 July 25 rev 25395ab

      All of these failed with a final error similar to SERVER-10090 (only 32-bit boxes):

      Thu Jul 25 14:21:13.822 assert.soon failed: function () { 
                      return s.getDB( db )[ coll ].find().count() == count; 
                      }, msg:undefined at src/mongo/shell/assert.js:7
      

      But prior to this failure, a bunch of these happen.

      assert.soon failed: function () {
                  // Set authenticated to stop an infinite recursion from getDB calling
                  // back into authenticate.
                  conn.authenticated = true;
                  print ("Authenticating to admin database as " +
                         jsTestOptions().adminUser + " with mechanism " +
                         DB.prototype._defaultAuthenticationMechanism +
                         " on connection: " + conn);
                  conn.authenticated = conn.getDB('admin').auth({
                      user: jsTestOptions().adminUser,
                      pwd: jsTestOptions().adminPassword
                  });
                  return conn.authenticated;
              }, msg:Authenticating connection: connection to 127.0.0.1:31001
      Error: Printing Stack Trace
          at printStackTrace (src/mongo/shell/utils.js:37:15)
          at doassert (src/mongo/shell/assert.js:6:5)
          at Function.assert.soon (src/mongo/shell/assert.js:174:60)
          at Object.jsTest.authenticate (src/mongo/shell/utils.js:437:16)
          at Mongo.getDB (src/mongo/shell/mongo.js:38:16)
          at /data/buildslaves/Linux_64bit_Weekly_Slow_Tests/mongo/jstests/repl/repl4.js:5:26
          at Function.assert.soon (src/mongo/shell/assert.js:168:17)
          at soonCount (/data/buildslaves/Linux_64bit_Weekly_Slow_Tests/mongo/jstests/repl/repl4.js:4:12)
          at doTest (/data/buildslaves/Linux_64bit_Weekly_Slow_Tests/mongo/jstests/repl/repl4.js:22:5)
          at /data/buildslaves/Linux_64bit_Weekly_Slow_Tests/mongo/jstests/repl/repl4.js:39:5
      

      As far as I can tell, all that's happening and failing here is authentication attempts.

            Assignee:
            Unassigned Unassigned
            Reporter:
            matt.kangas Matt Kangas
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: