[SERVER-4781] replica set initial sync failure when update cannot be applied to a future version of an object received via clone Created: 26/Jan/12 Updated: 04/Jun/21 Resolved: 23/Oct/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 2.2.1, 2.3.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Aaron Staple | Assignee: | Alberto Lerner |
| Resolution: | Done | Votes: | 3 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Description |
|
Some modifier operations may uassert when they cannot be applied to a selected document. It's possible for an update to uassert when applied to a future version of a document during the initial sync phase, and this aborts the initial sync. It appears that the initial sync is then retried but if new, bad updates arrive I imagine the initial sync failure could keep repeating Here is a test I wrote for this. The first attempt at initial sync fails, but the test stops sending updates so when the initial sync is retried. As a result, the current test passes. So it's just good as a demonstration right now.
Here is some relevant logging from the test
Also I wrote a test for master slave mode and that seems to work ok
I can push the tests when bb code freeze is over. |
| Comments |
| Comment by auto [ 17/Oct/12 ] |
|
Author: {u'date': u'2012-10-16T19:13:54-07:00', u'email': u'alerner@10gen.com', u'name': u'Alberto Lerner'}Message: |
| Comment by auto [ 17/Oct/12 ] |
|
Author: {u'date': u'2012-10-16T19:13:54-07:00', u'email': u'alerner@10gen.com', u'name': u'Alberto Lerner'}Message: |
| Comment by auto [ 15/Oct/12 ] |
|
Author: {u'date': u'2012-10-15T14:17:29-07:00', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}Message: |
| Comment by auto [ 15/Oct/12 ] |
|
Author: {u'date': u'2012-10-15T14:17:29-07:00', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}Message: |
| Comment by auto [ 15/Oct/12 ] |
|
Author: {u'date': u'2012-10-15T07:19:02-07:00', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}Message: Revert " This reverts commit 77bf5afd2483098c11b0286d7882d15b9351fbee. |
| Comment by auto [ 14/Oct/12 ] |
|
Author: {u'date': u'2012-10-14T12:29:57-07:00', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}Message: |
| Comment by Tad Marshall [ 24/Apr/12 ] |
|
Log of master/slave on Windows showing assertion failures running repl13.js |
| Comment by Aaron Staple [ 10/Feb/12 ] |
|
Another of the many update validation assertions where this can occur is if we have updates like: c.save( {} ) Since the push will be replicated as update( {_id:x,a:null}, {$push:{a:1}} ) |
| Comment by Aaron Staple [ 07/Feb/12 ] |
|
Hi dustin - just to confirm is the assertion happening during initial sync of a new slave? |
| Comment by dustin norlander [ 07/Feb/12 ] |
|
GAH! We just got this in production. Slave cannot get past the assertion and it getting staler by the minute. What can we do? |
| Comment by auto [ 07/Feb/12 ] |
|
Author: {u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'}Message: |
| Comment by auto [ 07/Feb/12 ] |
|
Author: {u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'}Message: |