[SERVER-10127] authCommands.js failing on Linux RHEL 64-bit Created: 08/Jul/13  Updated: 10/Dec/14  Resolved: 13/Aug/13

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: 2.5.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matt Kangas Assignee: Andy Schwerin
Resolution: Cannot Reproduce Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

buildbot: Linux RHEL 64-bit


Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Linux RHEL 64-bit Build #1356 July 07 and #1357 July 08

http://buildbot.mongodb.org/builders/Linux%20RHEL%2064-bit/builds/1356/steps/test_11/logs/stdio
http://buildlogs.mongodb.org/Linux%20RHEL%2064-bit/builds/1356/test/sharding/authCommands.js

----
Checking commands with read-only auth credentials
----
 m30999| Sun Jul  7 16:26:38.732 [conn1]  authenticate db: test { authenticate: 1, nonce: "9e09b1f8d54a43eb", user: "roUser", key: "aa67879692cff0814d6f24063b4c40e3" }
 m31100| Sun Jul  7 16:26:38.732 [conn79] runQuery called test.system.users { user: "roUser", userSource: null }
 m31100| Sun Jul  7 16:26:38.733 [conn79] query test.system.users query: { user: "roUser", userSource: null } ntoreturn:1 ntoskip:0 nscanned:1 keyUpdates:0 locks(micros) r:240 nreturned:1 reslen:112 0ms
 m31100| Sun Jul  7 16:26:38.733 [conn79] runQuery called test.system.users { user: "roUser", userSource: null }
 m31100| Sun Jul  7 16:26:38.734 [conn79] query test.system.users query: { user: "roUser", userSource: null } ntoreturn:1 ntoskip:0 nscanned:1 keyUpdates:0 locks(micros) r:86 nreturned:1 reslen:112 0ms
Checking read operations, should work
 m31100| Sun Jul  7 16:26:38.734 [conn48] runQuery called admin.$cmd { setShardVersion: "test.foo", configdb: "localhost:29000,localhost:29001,localhost:29002", version: Timestamp 7000|1, versionEpoch: ObjectId('51d9ce3b617f2a4917421570'), serverID: ObjectId('51d9ce11617f2a4917421569'), shard: "test-rs0", shardHost: "test-rs0/localhost:31100,localhost:31101,localhost:31102" }
 m31100| Sun Jul  7 16:26:38.734 [conn48] run command admin.$cmd { setShardVersion: "test.foo", configdb: "localhost:29000,localhost:29001,localhost:29002", version: Timestamp 7000|1, versionEpoch: ObjectId('51d9ce3b617f2a4917421570'), serverID: ObjectId('51d9ce11617f2a4917421569'), shard: "test-rs0", shardHost: "test-rs0/localhost:31100,localhost:31101,localhost:31102" }
 m31100| Sun Jul  7 16:26:38.734 [conn48] command: { setShardVersion: "test.foo", configdb: "localhost:29000,localhost:29001,localhost:29002", version: Timestamp 7000|1, versionEpoch: ObjectId('51d9ce3b617f2a4917421570'), serverID: ObjectId('51d9ce11617f2a4917421569'), shard: "test-rs0", shardHost: "test-rs0/localhost:31100,localhost:31101,localhost:31102" }
 m31100| Sun Jul  7 16:26:38.734 [conn48] command admin.$cmd command: { setShardVersion: "test.foo", configdb: "localhost:29000,localhost:29001,localhost:29002", version: Timestamp 7000|1, versionEpoch: ObjectId('51d9ce3b617f2a4917421570'), serverID: ObjectId('51d9ce11617f2a4917421569'), shard: "test-rs0", shardHost: "test-rs0/localhost:31100,localhost:31101,localhost:31102" } ntoreturn:1 keyUpdates:0  reslen:86 0ms
 m31100| Sun Jul  7 16:26:38.734 [conn48] runQuery called test.foo {}
 m31100| Sun Jul  7 16:26:38.736 [conn48] query test.foo cursorid:371782904856783 ntoreturn:0 ntoskip:0 nscanned:102 keyUpdates:0 locks(micros) r:1641 nreturned:101 reslen:832866 1ms
 m31200| Sun Jul  7 16:26:38.736 [conn20] runQuery called admin.$cmd { setShardVersion: "test.foo", configdb: "localhost:29000,localhost:29001,localhost:29002", version: Timestamp 7000|0, versionEpoch: ObjectId('51d9ce3b617f2a4917421570'), serverID: ObjectId('51d9ce11617f2a4917421569'), shard: "test-rs1", shardHost: "test-rs1/localhost:31200,localhost:31201,localhost:31202" }
 m31200| Sun Jul  7 16:26:38.736 [conn20] run command admin.$cmd { setShardVersion: "test.foo", configdb: "localhost:29000,localhost:29001,localhost:29002", version: Timestamp 7000|0, versionEpoch: ObjectId('51d9ce3b617f2a4917421570'), serverID: ObjectId('51d9ce11617f2a4917421569'), shard: "test-rs1", shardHost: "test-rs1/localhost:31200,localhost:31201,localhost:31202" }
 m31200| Sun Jul  7 16:26:38.736 [conn20] command: { setShardVersion: "test.foo", configdb: "localhost:29000,localhost:29001,localhost:29002", version: Timestamp 7000|0, versionEpoch: ObjectId('51d9ce3b617f2a4917421570'), serverID: ObjectId('51d9ce11617f2a4917421569'), shard: "test-rs1", shardHost: "test-rs1/localhost:31200,localhost:31201,localhost:31202" }
 m31200| Sun Jul  7 16:26:38.737 [conn20] command admin.$cmd command: { setShardVersion: "test.foo", configdb: "localhost:29000,localhost:29001,localhost:29002", version: Timestamp 7000|0, versionEpoch: ObjectId('51d9ce3b617f2a4917421570'), serverID: ObjectId('51d9ce11617f2a4917421569'), shard: "test-rs1", shardHost: "test-rs1/localhost:31200,localhost:31201,localhost:31202" } ntoreturn:1 keyUpdates:0  reslen:86 0ms
 m31200| Sun Jul  7 16:26:38.737 [conn20] runQuery called test.foo {}
 m31200| Sun Jul  7 16:26:38.739 [conn20] query test.foo cursorid:460043853176823 ntoreturn:0 ntoskip:0 nscanned:102 keyUpdates:0 locks(micros) r:1662 nreturned:101 reslen:832866 1ms
 m31200| Sun Jul  7 16:26:38.762 [conn40] getmore test.foo cursorid:460043853176823 ntoreturn:0 keyUpdates:0 locks(micros) r:6947 nreturned:448 reslen:3694228 6ms
 m31100| Sun Jul  7 16:26:38.770 [conn78] getmore test.foo cursorid:371782904856783 ntoreturn:0 keyUpdates:0 locks(micros) r:4025 nreturned:339 reslen:2795414 4ms
assert: [1000] != [989] are not equal : undefined
Error: Printing Stack Trace
    at printStackTrace (src/mongo/shell/utils.js:37:15)
    at doassert (src/mongo/shell/assert.js:6:5)
    at Function.assert.eq (src/mongo/shell/assert.js:32:5)
    at checkReadOps (/data/buildslaves/Linux_RHEL_64bit/mongo/jstests/sharding/authCommands.js:107:16)
    at /data/buildslaves/Linux_RHEL_64bit/mongo/jstests/sharding/authCommands.js:281:1
Sun Jul  7 16:26:38.875 [1000] != [989] are not equal : undefined at src/mongo/shell/assert.js:7
failed to load: /data/buildslaves/Linux_RHEL_64bit/mongo/jstests/sharding/authCommands.js

This build was green as of July 06 (rev bdf9a87b13e3)



 Comments   
Comment by Matt Kangas [ 13/Aug/13 ]

This failure continues to occur on Linux RHEL 64-bit...

Linux RHEL 64-bit Build #1481 Fri Aug 9
http://buildlogs.mongodb.org/Linux%20RHEL%2064-bit/builds/1481/test/sharding/authCommands.js

... but so do a number of other failures. I think this box has problems. I'm going to resolve as "cannot reproduce" until we see this issue crop up elsewhere.

Comment by auto [ 12/Jul/13 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: SERVER-10127 Add extra assert to catch incomplete initial data earlier in sharding/authCommands.js
Branch: master
https://github.com/mongodb/mongo/commit/458cfbf633d94357212d03587e6544d296df58d5

Comment by Andy Schwerin [ 12/Jul/13 ]

kangas, spencer and I are pretty sure this isn't auth related. Not really sure where to take it from here. I can try to dig into it Monday, if nobody better suited is available.

Comment by Andy Schwerin [ 12/Jul/13 ]

I think the SocketException may be a red herring. Successful runs are littered with them.

Comment by Spencer Brody (Inactive) [ 11/Jul/13 ]

There seems to have been a SocketException shortly after the test printed "Creating initial data":

m31202| Sun Jul  7 16:23:12.014 [conn6] SocketException: remote: 127.0.0.1:58976 error: 9001 socket exception [CLOSED] server [127.0.0.1:58976] 

perhaps it wasn't done inserting the initial data before that socket exception happened? Though I still have no idea what caused the socket exception in the first place...

Comment by Spencer Brody (Inactive) [ 11/Jul/13 ]

Looks like something may have gone wrong when creating the initial data at the beginning of the test? The test expects there to be 1000 documents but it's only finding 989...

Perhaps we should add an assert earlier on, right after setting up the initial data, to make sure the count is what we expect? This would at least let us catch the problem sooner, though I'm still not sure why 989 documents could get inserted properly but not the last 11...

Comment by Matt Kangas [ 11/Jul/13 ]

This has now morphed into a new error.

Linux RHEL 64-bit Build #1367 July 11

http://buildbot.mongodb.org/builders/Linux%20RHEL%2064-bit/builds/1367/steps/test_11/logs/stdio
http://buildlogs.mongodb.org/Linux%20RHEL%2064-bit/builds/1367/test/sharding/authCommands.js

Thu Jul 11 06:12:48.452 assert.soon failed: function (){
    print( "Waiting for migration cleanup to occur..." )
    return testDB.foo.find().itcount() == testDB.foo.count();
}, msg:undefined at src/mongo/shell/assert.js:7
failed to load: /data/buildslaves/Linux_RHEL_64bit/mongo/jstests/sharding/authCommands.js

Generated at Thu Feb 08 03:22:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.