Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10344

Race condition when starting up new master/slave cluster. Was: repl4.js failing on Linux 64-bit Weekly Slow Tests

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • None
    • Replication
    • ALL
    • Hide

      time scons --dd --sharedclient all
      python buildscripts/cleanbb.py
      ./buildscripts/smoke.py --mode=files --auth jstests/repl/repl4.js

      Show
      time scons --dd --sharedclient all python buildscripts/cleanbb.py ./buildscripts/smoke.py --mode=files --auth jstests/repl/repl4.js

    Description

      jstests/repl/repl4.js was failing when started under auth mode.

      I believe this is actually a timing bug in master/slave replication, not directly an auth issue. It seems to have something to do with setting up a slave that only syncs a single DB from the master.

      If you put a "sleep(5000)" after line 13 in repl4.js (the line that starts the primary), then the test passes. Also, if you switch the order of lines 19 and 20 (the lines that do writes into 2 dbs, one that's synced and one that isn't) then the test passes.

      When the test fails, this shows up in the logs of the slave:

      m31001| Fri Jul 26 15:30:01.972 [replslave] repl:   nextOpTime Jul 26 15:30:01 51f2ce39:1 > syncedTo Dec 31 19:00:00 0:0
       m31001| repl:   time diff: 1374867001sec
       m31001| repl:   tailing: 0
       m31001| repl:   data too stale, halting replication



      ORIGINAL DESCRIPTION:

      Linux 64-bit Weekly Slow Tests Build #256 July 14 rev f204f7f

      Linux 64-bit Weekly Slow Tests Build #261 July 21 rev 9bf7075

      Linux 64-bit Weekly Slow Tests Build #262 July 23 rev 37f7f30

      (#263 was interrupted)

      Linux 64-bit Weekly Slow Tests Build #264 July 25 rev 25395ab

      All of these failed with a final error similar to SERVER-10090 (only 32-bit boxes):

      Thu Jul 25 14:21:13.822 assert.soon failed: function () { 
                      return s.getDB( db )[ coll ].find().count() == count; 
                      }, msg:undefined at src/mongo/shell/assert.js:7

      But prior to this failure, a bunch of these happen.

      assert.soon failed: function () {
                  // Set authenticated to stop an infinite recursion from getDB calling
                  // back into authenticate.
                  conn.authenticated = true;
                  print ("Authenticating to admin database as " +
                         jsTestOptions().adminUser + " with mechanism " +
                         DB.prototype._defaultAuthenticationMechanism +
                         " on connection: " + conn);
                  conn.authenticated = conn.getDB('admin').auth({
                      user: jsTestOptions().adminUser,
                      pwd: jsTestOptions().adminPassword
                  });
                  return conn.authenticated;
              }, msg:Authenticating connection: connection to 127.0.0.1:31001
      Error: Printing Stack Trace
          at printStackTrace (src/mongo/shell/utils.js:37:15)
          at doassert (src/mongo/shell/assert.js:6:5)
          at Function.assert.soon (src/mongo/shell/assert.js:174:60)
          at Object.jsTest.authenticate (src/mongo/shell/utils.js:437:16)
          at Mongo.getDB (src/mongo/shell/mongo.js:38:16)
          at /data/buildslaves/Linux_64bit_Weekly_Slow_Tests/mongo/jstests/repl/repl4.js:5:26
          at Function.assert.soon (src/mongo/shell/assert.js:168:17)
          at soonCount (/data/buildslaves/Linux_64bit_Weekly_Slow_Tests/mongo/jstests/repl/repl4.js:4:12)
          at doTest (/data/buildslaves/Linux_64bit_Weekly_Slow_Tests/mongo/jstests/repl/repl4.js:22:5)
          at /data/buildslaves/Linux_64bit_Weekly_Slow_Tests/mongo/jstests/repl/repl4.js:39:5

      As far as I can tell, all that's happening and failing here is authentication attempts.

      Attachments

        Activity

          People

            Unassigned Unassigned
            matt.kangas Matt Kangas
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: