[SERVER-7854] Potential bug in rollbacks when running with auth Created: 06/Dec/12  Updated: 11/Jul/16  Resolved: 03/Jan/13

Status: Closed
Project: Core Server
Component/s: Replication, Security
Affects Version/s: 2.3.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-7759 Don't purge bgbuffer after SECONDARY-... Closed
Operating System: ALL
Participants:

 Description   

jstests/replsets/rollback2.js and rollback3.js are failing when run with authentication enabled.

From a git bisect, I found the first commit where the test is failing to be:

commit 07a6fd4726a8e876266319cd8d22d64111cf8688
Author: Kristina <kristina@10gen.com>
Date:   Thu Oct 4 15:18:29 2012 -0400
 
    SERVER-1929 Remove unused heartbeat options from stepdown logic
    
    Fixed test because stepdown is so much faster that the connection is dead
    by the time ismaster is called.

It seems like that commit probably just caused the bug to manifest by changing the timings of things, rather than actually introducing the bug.

The basic summary of what the test is doing is this:
1) Set up a replica set with 2 nodes, A and B, and an arbiter. Do some writes, wait for them to replicated to the secondary.
2) Isolate node A (by using the replSetTest command with the blind option), wait for node B to become primary.
3) Do some writes to node B. These writes should be rolled back
4) Blind node B, then unblind node A. The node Awill take over as primary, but won't see the writes from step 3.
5) Do some new writes to node A
6) Unblind node B. This should cause it to rollback the writes from step 3.

The bug is that no rollback is happening in step 6. I changed the test to print the oplogs of both nodes after step 6 and node B has all the original writes, then the writes it got in step 3, and THEN the writes that node A got in step 5. B synced off A without first rolling back to a common point.

Please let me know if there's anything I can do to help debug.



 Comments   
Comment by Spencer Brody (Inactive) [ 03/Jan/13 ]

Works for me.

Comment by Kristina Chodorow (Inactive) [ 03/Jan/13 ]

These failures should be fixed by SERVER-7759's fix and now pass for me. spencer, does it work for you?

Comment by Spencer Brody (Inactive) [ 06/Dec/12 ]

To run this test with auth, run:

buildscripts/smoke.py --nojournal --auth jstests/replsets/rollback2.js

Generated at Thu Feb 08 03:15:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.