[SERVER-16015] benchRun no longer working after switching PRIMARY replicaset member to a new node Created: 07/Nov/14  Updated: 15/Nov/21  Resolved: 10/Apr/15

Status: Closed
Project: Core Server
Component/s: Tools
Affects Version/s: 2.6.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Nikolaos Vyzas Assignee: Ramon Fernandez Marina
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Tested
Operating System: ALL
Participants:

 Description   

Running this command worked both via mongos & directly on the replicaset until the primary node was replaced with a newly built node.

The new node is of higher spec than the previous i.e.

Initially we had:
Replicaset-node1 [PRIMARY]
Replicaset-node2
Replicaset-node3 [ARBITER}

Then added Replicaset-node4 and removed Replicaset-node1

Final config:
Replicaset-node2
Replicaset-node3 [ARBITER}
Replicaset-node4 [PRIMARY]

After that benchRun started reporting: Error: invalid parameter: expected an object ()

replica1:PRIMARY> 
replica1:PRIMARY> use test;
switched to db test
replica1:PRIMARY> db.foo.drop()
true
replica1:PRIMARY> db.foo.insert({_id:1})
WriteResult({ "nInserted" : 1 })
replica1:PRIMARY> ops = [{op: "findOne", ns: "test.foo", query: {_id:1}},
...        {op: "update",ns: "test.foo", query: {_id:1}, update: {$inc: {x:1}}}]
[
	{
		"op" : "findOne",
		"ns" : "test.foo",
		"query" : {
			"_id" : 1
		}
	},
	{
		"op" : "update",
		"ns" : "test.foo",
		"query" : {
			"_id" : 1
		},
		"update" : {
			"$inc" : {
				"x" : 1
			}
		}
	}
]
replica1:PRIMARY> for (var x = 1; x<=256; x*=2) {
...     res = benchRun({
...         parallel : x, 
... seconds: 10, 
... ops:ops
...     }); 
...     print ("threads: " + x + "\t queries/sec: " + res.query);
... }
2014-11-07T15:37:43.481+0000 Error: invalid parameter: expected an object ()



 Comments   
Comment by Ramon Fernandez Marina [ 20/Mar/15 ]

Hi vyzas, apologies for the long delay. I finally got around to test this scenario using sharding and, to the best of my understanding, with a configuration that's analogous to you. This is what I did:

  • Sharded cluster with one mongos, 3 config servers, and two shards. Each shard has 2 data-bearing nodes and 1 arbiter

    PROCESS          PORT     STATUS
     
    mongos           27017    running
     
    config server    27024    running
    config server    27025    running
    config server    27026    running
     
    shard01
        primary       27018    running
        secondary    27019    running
        arbiter      27020    running
     
    shard02
        primary      27021    running
        secondary    27022    running
        arbiter      27023    running
    

  • Connected to mongos and run

    sh.enableSharding("test")
    sh.shardCollection("test.foo", {_id:1})
    

  • Run your benchRun test above.
  • While the test was running, I configured a new server on port 30000, stepped down shard01.primary, added the new server, and forced it to become primary, resulting in the following sharding configuration:

    PROCESS          PORT     STATUS
     
    mongos           27017    running
     
    config server    27024    running
    config server    27025    running
    config server    27026    running
     
    shard01
        secondary    27019    running
        primary      30000    running
        arbiter      27020    running
     
    shard02
        primary      27021    running
        secondary    27022    running
        arbiter      27023    running
    

I was not able to observer the error you describe. I also tried replacing the primary while benchRun was not running, but I got the same result.

If this is still an issue for you, can you please send us the logs for the mongos router as well as for all the data-bearing nodes in your cluster?

Thanks,
Ramón.

Comment by Nikolaos Vyzas [ 09/Dec/14 ]

Please add 3x config servers and 1x mongos router to your configuration, then enable sharding and run the test via the "mongos" service rather than directly on the replicaset.

Comment by Ramon Fernandez Marina [ 04/Dec/14 ]

vyzas, I'm not able to reproduce the "invalid parameter" error message. This is what I did:

  • Started a 2 node + 1 arbiter replicaset and run the code you uploaded
  • Stepped down the primary (node1), and from the new primary (node2) I removed node1 from the replica set
  • Launched a new node (node4), and stepped down node2 so node4 would become primary
    The code you posted continued to run fine.

If you're still seeing this issue, could you post more detailed instructions on your setup so we can try to match it as closely as possible?

Thanks,
Ramón.

Comment by Alvin Richards (Inactive) [ 10/Nov/14 ]

vyzas@pythian.com the "cap" flag is just used for internal audit and verification.

Comment by Nikolaos Vyzas [ 10/Nov/14 ]

What is required for CAP verification?

The cluster is running in a single shard configuration with:

3x config servers
2x replicaset members (default write concern)
1x arbiter
6+ mongos routers

Generated at Thu Feb 08 03:39:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.