[SERVER-7580] PHP getLastError wrong when Sharding Created: 07/Nov/12  Updated: 15/Feb/13  Resolved: 19/Nov/12

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.2.0
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: Dwayne Bull Assignee: Kristina Chodorow (Inactive)
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu


Issue Links:
Duplicate
duplicates SERVER-4532 GetLastError on sharded cluster can r... Closed
Operating System: Linux
Participants:

 Description   

getLastError returning old/wrong/no data. The code that we are running works 100% when not sharded.

We have a 3 shard setup, each shard with 3 servers in a replica set. 3 config servers and 1 mongos running on the application server.

All querys are using safemode.

After going an update with a query that would match 0 documents the following is returned via getLastError()

"singleShard":"Set_01\/db1.test.co.uk,db5.test.co.uk,db9.test.co.uk",
"updatedExisting":true,
"n":1,
"lastOp":

{ "sec":1352302481, "inc":2 }

,
"connectionId":1392,
"err":null,
"ok":1

However, no documents will have been updated so n=1 should be wrong as well as updatedExsisting.

When using a query that does match a document ( and does successfully update), occasionally we will get the following from getLastError()

"singleShard":"Set_02\/db2.test.co.uk,db6.test.co.uk,db7.test.co.uk",
"n":0,
"lastOp":

{ "sec":1352295232, "inc":31 }

,
"connectionId":239,
"err":null,
"ok":1,
"writeback":

{ "$id":"509a633f0000000000000048" }

,
"instanceIdent":"mongo-01",
"updatedExisting":true,
"writebackGLE":{
"singleShard":"Set_02\/db2.test.co.uk,db6.test.co.uk,db7.test.co.uk",
"n":0,
"lastOp":

{ "sec":1352295232, "inc":31 }

,
"connectionId":239,
"err":null,
"ok":1
},
"initialGLEHost":"Set_01\/db1.test.co.uk,db5.test.co.uk,db9.test.co.uk"

"lastOp" is sometimes the same, if that means anything?



 Comments   
Comment by Kristina Chodorow (Inactive) [ 19/Nov/12 ]

Looks like this is a known issue: https://jira.mongodb.org/browse/SERVER-4532, please track/comment there, so we can keep all the info together in one place.

Comment by Dwayne Bull [ 09/Nov/12 ]

I've setup the server to log all querys so I should have more data on the 0 docs n=1 soon.

Comment by Kristina Chodorow (Inactive) [ 08/Nov/12 ]

Thanks! You can't reproduce #1 (the update matching 0 documents returning n=1) consistently, though, right? (If I'm correct about that, it would really be helpful to get an explain right after you get an unexpected result, although I understand that can be hard to set up. If not, that's fantastic, it should be fairly easy to track down!)

Still looking to 2.

Comment by Dwayne Bull [ 08/Nov/12 ]

1) Here is an Explain on a based on your point 1, I've used the same query as in point 2 ( further down ) but replaced the r.m.c variable so it will not match any documents ( nor will it match anything connected directly ):

"needle" : {
        "_id" : "be158d7949b6c9683722da04f109cfe0",
	"r.m.c" : {
		"$gte" : 999999999
	},
	"r.g.c" : {
		"$gte" : 800
	},
	"r.c.c" : {
		"$gte" : 0
	},
	"r.n.c" : {
		"$gte" : 0
	}
}
 
result:{
	"clusteredType" : "ParallelSort",
	"shards" : {
		"Set_03/db3.test.com,db4.test.com,db8.test.com" : [
			{
				"cursor" : "BtreeCursor _id_",
				"isMultiKey" : false,
				"n" : 0,
				"nscannedObjects" : 1,
				"nscanned" : 1,
				"nscannedObjectsAllPlans" : 2,
				"nscannedAllPlans" : 2,
				"scanAndOrder" : false,
				"indexOnly" : false,
				"nYields" : 0,
				"nChunkSkips" : 0,
				"millis" : 0,
				"indexBounds" : {
					"_id" : [
						[
							"be158d7949b6c9683722da04f109cfe0",
							"be158d7949b6c9683722da04f109cfe0"
						]
					]
				},
				"server" : "mongo-02"
			}
		]
	},
	"cursor" : "BtreeCursor _id_",
	"n" : 0,
	"nChunkSkips" : 0,
	"nYields" : 0,
	"nscanned" : 1,
	"nscannedAllPlans" : 2,
	"nscannedObjects" : 1,
	"nscannedObjectsAllPlans" : 2,
	"millisShardTotal" : 0,
	"millisShardAvg" : 0,
	"numQueries" : 1,
	"numShards" : 1,
	"indexBounds" : {
		"_id" : [
			[
				"be158d7949b6c9683722da04f109cfe0",
				"be158d7949b6c9683722da04f109cfe0"
			]
		]
	},
	"millis" : 17
}
 

2) Here is a result from a successful update but with the wrong error (result is the call to getLastError):

"needle" : {
	"_id" : "be158d7949b6c9683722da04f109cfe0",
	"r.m.c" : {
		"$gte" : 1000
	},
	"r.g.c" : {
		"$gte" : 800
	},
	"r.c.c" : {
		"$gte" : 0
	},
	"r.n.c" : {
		"$gte" : 0
	}
},
"update" : {
	"$inc" : {
		"r.m.c" : -1000,
		"r.g.c" : -800,
		"r.c.c" : 0,
		"r.n.c" : 0
	}
},
"result" : {
	"singleShard" : "Set_03/db3.test.com,db4.test.com,db8.test.com",
	"n" : 0,
	"lastOp" : Timestamp(1352369274000, 421),
	"connectionId" : 153,
	"err" : null,
	"ok" : 1,
	"writeback" : ObjectId("509b8477000000000000005a"),
	"instanceIdent" : "mongo-02:30003",
	"updatedExisting" : true,
	"writebackGLE" : {
		"singleShard" : "Set_03/db3.test.com,db4.test.com,db8.test.com",
		"n" : 0,
		"lastOp" : Timestamp(1352369274000, 421),
		"connectionId" : 153,
		"err" : null,
		"ok" : 1
	},
	"initialGLEHost" : "Set_03/db3.test.com,db4.test.com,db8.test.com"
}

Comment by Kristina Chodorow (Inactive) [ 08/Nov/12 ]

1. When you get n=1 on something you don't expect to exist, can you run an explain on the query (db.someColl.find(criteria).explain())? If you connect directly to the shard and repeat the query, does it still exist?

2. What does the update look like for the second example (n=0)? I can't actually find a code path that gives that result.

Comment by Dwayne Bull [ 08/Nov/12 ]

I'm afraid not, the issue only appeared once we started hammering the database with real data on our production setup, so we can't isolate all the code needed to reproduce it.

Comment by Eliot Horowitz (Inactive) [ 08/Nov/12 ]

Do you have a test program that can cause this?

Generated at Thu Feb 08 03:14:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.