[SERVER-1793] getLastError(2) hangs/timesout about every N inserts into replica set shard Created: 14/Sep/10  Updated: 12/Jul/16  Resolved: 27/Dec/10

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 1.7.0
Fix Version/s: 1.7.5

Type: Bug Priority: Major - P3
Reporter: Tony Hannan Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

db version v1.7.1-pre-, pdfile version 4.5
Tue Sep 14 13:19:38 git version: 524e633748a24b5a1e753373ba63e5c267964576


Operating System: ALL
Participants:

 Description   

1. Create replica set of 3 servers.
2. Add data to it.
3. Create 3 config servers for sharding.
4. Create one router (mongos)
5. Add replica set as solo shard
6. enable sharding on the db and collection that you already added data to
7. Repeatedly insert a record into sharded collection and check insert has gone to 2 replicas (getLastError(2))

There is a jstest for this test case at: jstests/grid/shard_insert_getlasterror_w2.

Problem: About every 170th insert fails (getLastError times out) on my Macbook
result of timed-out getLastErrorObj(2,30000) looks like:
{
"shards" : [
"127.0.0.1:31003,127.0.0.1:31004,127.0.0.1:31005",
"repset1/127.0.0.1:31000,127.0.0.1:31001,127.0.0.1:31002"
],
"n" : 0,
"err" : "",
"errs" : [
""
],
"errObjects" : [

{ "err" : null, "n" : 0, "wtimeout" : true, "waited" : 30000, "errmsg" : "timed out waiting for slaves", "ok" : 0 }

],
"ok" : 1
}

Result of a successful getLastErrorObj(2,30000) looks like:
{
"theshard" : "repset1/127.0.0.1:31000,127.0.0.1:31001,127.0.0.1:31002",
"err" : null,
"n" : 0,
"lastOp" : NumberLong("5516791406158413840"),
"wtime" : 2,
"ok" : 1,
"singleShard" : "repset1/127.0.0.1:31000,127.0.0.1:31001,127.0.0.1:31002"
}



 Comments   
Comment by auto [ 25/Dec/10 ]

Author:

{u'login': u'kchodorow', u'name': u'Kristina', u'email': u'kristina@10gen.com'}

Message: Revert "don't forward getlasterror to config servers SERVER-1793"

This reverts commit 9fbb4f81f42a50d88bb520af307cb237a48146dc.
https://github.com/mongodb/mongo/commit/60814a460faa902b75b3dc0b209e89c52512c8d6

Comment by auto [ 23/Dec/10 ]

Author:

{u'login': u'kchodorow', u'name': u'Kristina', u'email': u'kristina@10gen.com'}

Message: don't forward getlasterror to config servers SERVER-1793
https://github.com/mongodb/mongo/commit/9fbb4f81f42a50d88bb520af307cb237a48146dc

Comment by Kristina Chodorow (Inactive) [ 17/Nov/10 ]

Tony already wrote one (jstests/grid/shard_insert_getlasterror_w2.js), do you want another? I added a line to his to make it assert when it fails.

Comment by auto [ 17/Nov/10 ]

Author:

{'login': 'kchodorow', 'name': 'Kristina Chodorow', 'email': 'kristina@10gen.com'}

Message: make test assert SERVER-1793
/mongodb/mongo/commit/ec5f5befbecf4285dd5324225efa08a7baa6129c

Comment by Eliot Horowitz (Inactive) [ 15/Nov/10 ]

Kristina, can you try and make a test case that shows this

Generated at Thu Feb 08 02:58:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.