[SERVER-1768] getLastError(2) hangs forever when "replSet error RS102 too stale to catch up" Created: 09/Sep/10 Updated: 19/Sep/15 Resolved: 21/Apr/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 1.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Tony Hannan | Assignee: | Kristina Chodorow (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
db version v1.7.1-pre-, pdfile version 4.5 |
||
| Operating System: | ALL |
| Participants: |
| Description |
|
1. Set up replica set of 3 servers. Problem: call never returns. However, works when only inserting 1000 docs. Servers log during problem: |
| Comments |
| Comment by Githook User [ 02/Sep/15 ] |
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: |
| Comment by Eliot Horowitz (Inactive) [ 21/Apr/11 ] |
|
There is no default timeout, so it'll wait forever correctly. |
| Comment by Tony Hannan [ 10/Sep/10 ] |
|
Not what I observed. Just repeated saying "replSet error RS102 too stale to catch up" |
| Comment by Eliot Horowitz (Inactive) [ 10/Sep/10 ] |
|
Doesn't the secondary then start a resync? |
| Comment by Tony Hannan [ 10/Sep/10 ] |
|
According to Dwight, this happens when the oplog rolls over before the secondary can read it. |
| Comment by Tony Hannan [ 10/Sep/10 ] |
|
I just sent a pull request from TonyGen containing my test case. |
| Comment by Dwight Merriman [ 10/Sep/10 ] |
|
see replset2 or replset5.js |
| Comment by Dwight Merriman [ 10/Sep/10 ] |
|
tony, please take one of hte jstests/replsets/ and modify it s.t. it reproduces this. then i can reproduce and we will also have a regression test forever. i would take one of the existing getlasterror tests there and modify iut. also a good place to start is to see if those tests pass for you on your box. once that is done, assign it back to me. |
| Comment by Tony Hannan [ 10/Sep/10 ] |
|
Still hung after 45 minutes. |
| Comment by Eliot Horowitz (Inactive) [ 09/Sep/10 ] |
|
This should recover... |