[SERVER-7885] First update on just migrated document succeeds but gets GLE with n=0 Created: 09/Dec/12  Updated: 07/Mar/14  Resolved: 09/Dec/12

Status: Closed
Project: Core Server
Component/s: Sharding, Write Ops
Affects Version/s: 2.2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Asya Kamsky Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-4532 GetLastError on sharded cluster can r... Closed
Operating System: ALL
Steps To Reproduce:

See above.

Participants:

 Description   

I was able to reproduce this with both Java and Python drivers (as well as shell). It requires starting a new connection/process.

To reproduce, create two shards. Insert some data into a collection sharded on _id. Split into two chunks. Move chunk containing _id: X to a different shard through mongos on 57018 (different than what test program uses).

Now an update to ({_id:X}, {$set:{whatever}}) will return GLE:

(in Java):

{
	"serverUsed" : "localhost:57017",
	"shards" : [
		"asya/localhost:27017,localhost:27018,localhost:27019",
		"localhost:37017"
	],
	"shardRawGLE" : {
		"asya/localhost:27017,localhost:27018,localhost:27019" : {
			"writeback" : {
				"$oid" : "50c3f952000000000000011f"
			},
			"instanceIdent" : "Asyas-MacBook-Air.local:27018",
			"n" : 0,
			"lastOp" : {
				"$ts" : 1355020595,
				"$inc" : 1
			},
			"connectionId" : 33220,
			"wtime" : 0,
			"err" : null,
			"ok" : 1
		},
		"localhost:37017" : {
			"writeback" : {
				"$oid" : "50c3f87100000000000005ba"
			},
			"instanceIdent" : "Asyas-MacBook-Air.local:37017",
			"n" : 0,
			"connectionId" : 62,
			"wtime" : 0,
			"err" : null,
			"ok" : 1
		}
	},
	"n" : 0,
	"err" : null,
	"ok" : 1
}

In Python I got the same thing. Basically, do a moveChunk (on a different mongos) to the other shard and run the little test program again - first time after chunk move it always returns this sort of document, n=0 and no updatedExisting even though the record gets updated.



 Comments   
Comment by Asya Kamsky [ 09/Dec/12 ]

Seems to be a duplicate of SERVER-4532 - closing out and will add a simple shell test case there.

Comment by Asya Kamsky [ 09/Dec/12 ]

I got debug output from mongos that gives n:0. Here is the log (normal verbosity) on 57018 where I moved chunk:

Sun Dec  9 15:09:36 [conn2] CMD: movechunk: { moveChunk: "test2.coll", find: { _id: 1001.0 }, to: "asya" }
Sun Dec  9 15:09:36 [conn2] moving chunk ns: test2.coll moving ( ns:test2.coll at: shard0000:localhost:37017 lastmod: 75|0||000000000000000000000000 min: { _id: 1001.0 } max: { _id: 1002.0 }) shard0000:localhost:37017 -> asya:asya/localhost:27017,localhost:27018,localhost:27019

And here is verbose output from mongos on 57017 when I immediately ran an update (which returned n:0):

Sun Dec  9 15:09:41 [mongosMain] connection accepted from 127.0.0.1:49768 #123 (2 connections now open)
Sun Dec  9 15:09:41 [conn123] Request::process begin ns: admin.$cmd msg id: 16807 op: 2004 attempt: 0
Sun Dec  9 15:09:41 [conn123] single query: admin.$cmd  { ismaster: 1 }  ntoreturn: -1 options : 0
Sun Dec  9 15:09:41 [conn123] Request::process end ns: admin.$cmd msg id: 16807 op: 2004 attempt: 0 0ms
Sun Dec  9 15:09:41 [conn123] Request::process begin ns: test2.coll msg id: 282475249 op: 2001 attempt: 0
Sun Dec  9 15:09:41 [conn123] write: test2.coll
Sun Dec  9 15:09:41 [conn123] Request::process end ns: test2.coll msg id: 282475249 op: 2001 attempt: 0 0ms
Sun Dec  9 15:09:41 [conn123] Request::process begin ns: test2.$cmd msg id: 282475249 op: 2004 attempt: 0
Sun Dec  9 15:09:41 [conn123] single query: test2.$cmd  { getlasterror: 1 }  ntoreturn: -1 options : 0
Sun Dec  9 15:09:41 [conn123] gathering a response for gle from: asya/localhost:27017,localhost:27018,localhost:27019
Sun Dec  9 15:09:41 [conn123] gathering a response for gle from: localhost:37017
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] writebacklisten result: { data: { writeBack: true, ns: "test2.coll", id: ObjectId('50c40f0500000000000005c2'), connectionId: 66, instanceIdent: "Asyas-MacBook-Air.local:37017", version: Timestamp 76000|0, versionEpoch: ObjectId('50c3c09f7286e6599f60afb1'), yourVersion: Timestamp 75000|0, yourVersionEpoch: ObjectId('50c3c09f7286e6599f60afb1'), msg: BinData }, ok: 1.0 }
Sun Dec  9 15:09:41 [conn123] gathering writebacks from 2 hosts for gle (2 shards)
Sun Dec  9 15:09:41 [conn123] checking 2 writebacks for gle (2 shards)
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] connectionId: Asyas-MacBook-Air.local:37017:66 writebackId: 50c40f0500000000000005c2 needVersion : 76|0||50c3c09f7286e6599f60afb1 mine : 75|1||50c3c09f7286e6599f60afb1
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] op: update len: 80 ns: test2.coll flags: 0 query: { _id: 1001 } update: { $set: { a: "0000000" } }
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] new version change detected to 76|0||50c3c09f7286e6599f60afb1, 1 writebacks processed at 74|0||50c3c09f7286e6599f60afb1
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] loading chunk manager for collection test2.coll using old chunk manager w/ version 75|1||50c3c09f7286e6599f60afb1 and 6 chunks
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] found 3 new chunks for collection test2.coll (tracking 3), new version is 0x100c105f0
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] loaded 3 chunks into new chunk manager for test2.coll with version 76|1||50c3c09f7286e6599f60afb1
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] ChunkManager: time to load chunks for test2.coll: 0ms sequenceNumber: 88 version: 76|1||50c3c09f7286e6599f60afb1 based on: 75|1||50c3c09f7286e6599f60afb1
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] Request::process begin ns: test2.coll msg id: 19352745 op: 2001 attempt: 1
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] write: test2.coll
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017]  have to set shard version for conn: localhost:27018 ns:test2.coll my last seq: 86  current: 88 version: 76|0||50c3c09f7286e6599f60afb1 manager: 0x100c104b0
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017]     setShardVersion  asya localhost:27018  test2.coll  { setShardVersion: "test2.coll", configdb: "localhost:50001,localhost:50005,localhost:50006", version: Timestamp 76000|0, versionEpoch: ObjectId('50c3c09f7286e6599f60afb1'), serverID: ObjectId('50bbdfa27286e6599f5ff01e'), shard: "asya", shardHost: "asya/localhost:27017,localhost:27018,localhost:27019" } 0x100b02f90
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] Sending command { setShardVersion: "test2.coll", configdb: "localhost:50001,localhost:50005,localhost:50006", version: Timestamp 76000|0, versionEpoch: ObjectId('50c3c09f7286e6599f60afb1'), serverID: ObjectId('50bbdfa27286e6599f5ff01e'), shard: "asya", shardHost: "asya/localhost:27017,localhost:27018,localhost:27019" } to localhost:27018 with $auth: {}
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017]       setShardVersion success: { oldVersion: Timestamp 74000|0, oldVersionEpoch: ObjectId('50c3c09f7286e6599f60afb1'), ok: 1.0 }
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] Request::process end ns: test2.coll msg id: 19352745 op: 2001 attempt: 1 0ms
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] gathering response for gle from: asya/localhost:27017,localhost:27018,localhost:27019
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] gathering writebacks from 1 hosts for gle (asya/localhost:27017,localhost:27018,localhost:27019)
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] checking 0 writebacks for gle (asya/localhost:27017,localhost:27018,localhost:27019)
Sun Dec  9 15:09:41 [WriteBackListener-localhost:37017] GLE is { singleShard: "asya/localhost:27017,localhost:27018,localhost:27019", updatedExisting: true, n: 1, lastOp: Timestamp 1355026181000|1, connectionId: 32370, err: null, ok: 1.0 }
Sun Dec  9 15:09:41 [conn123] Request::process end ns: test2.$cmd msg id: 282475249 op: 2004 attempt: 0 11ms
Sun Dec  9 15:09:41 [conn123] Socket recv() conn closed? 127.0.0.1:49768
Sun Dec  9 15:09:41 [conn123] SocketException: remote: 127.0.0.1:49768 error: 9001 socket exception [0] server [127.0.0.1:49768] 
Sun Dec  9 15:09:41 [conn123] end connection 127.0.0.1:49768 (1 connection now open)

Yet what it printed as res was:
N is 0

{u'err': None,
u'n': 0,
u'ok': 1.0,
u'shardRawGLE': {u'asya/localhost:27017,localhost:27018,localhost:27019':

{u'connectionId': 33512, u'err': None, u'instanceIdent': u'Asyas-MacBook-Air.local:27018', u'lastOp': Timestamp(1355024887, 1), u'n': 0, u'ok': 1.0, u'writeback': ObjectId('50c40cbe0000000000000126')}

,
u'localhost:37017': {u'connectionId': 66,
u'err': None,
u'instanceIdent': u'Asyas-MacBook-Air.local:37017',
u'n': 0,
u'ok': 1.0,
u'writeback': ObjectId('50c40f0500000000000005c2')}},
u'shards': [u'asya/localhost:27017,localhost:27018,localhost:27019',
u'localhost:37017']}

It's possible this should be driver bug - but given I see the same thing happening in multiple drivers ... so maybe mongos... and what's in the log is what mongod returns.

Comment by Asya Kamsky [ 09/Dec/12 ]

Here is the Python script I used and its output:

import pymongo
import pprint
conn=pymongo.MongoClient("localhost",57017)
db=conn.test2
coll=db.coll
res=coll.update({"_id":1001}, {"$set":{"a":"0000000"}} )
print "N is ", res["n"]
print "After:"
print coll.find_one({"_id":1001})
print ""
pp = pprint.PrettyPrinter(indent=1)
pp.pprint(res)

In the shell connected to 57018 (different mongos) type:

> db.coll.find({_id:1001})
{ "_id" : 1001, "a" : "1111111" }
Fetched 1 record(s) in 1ms
> sh.moveChunk("test2.coll",{_id:1001},"asya")
{ "millis" : 2524, "ok" : 1 }

Then run Python script. Output:

 
N is  0
After:
{u'a': u'0000000', u'_id': 1001.0}
 
{u'err': None,
 u'n': 0,
 u'ok': 1.0,
 u'shardRawGLE': {u'asya/localhost:27017,localhost:27018,localhost:27019': {u'connectionId': 33512,
                                                                            u'err': None,
                                                                            u'instanceIdent': u'Asyas-MacBook-Air.local:27018',
                                                                            u'lastOp': Timestamp(1355024246, 1),
                                                                            u'n': 0,
                                                                            u'ok': 1.0,
                                                                            u'writeback': ObjectId('50c4079c0000000000000125')},
                  u'localhost:37017': {u'connectionId': 66,
                                       u'err': None,
                                       u'instanceIdent': u'Asyas-MacBook-Air.local:37017',
                                       u'n': 0,
                                       u'ok': 1.0,
                                       u'writeback':ObjectId('50c407f400000000000005bf')}},
 u'shards': [u'asya/localhost:27017,localhost:27018,localhost:27019',
             u'localhost:37017']}

Generated at Thu Feb 08 03:15:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.