[SERVER-44837] Collection validation can cause inaccurate fast count when run against a node with a prepared transaction Created: 26/Nov/19  Updated: 18/Jun/20  Resolved: 21/May/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 4.2.1, 4.3.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Henrik Edin
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-41888 Shutting down with prepared transacti... Closed
related to SERVER-48981 Implement local isolation for fastcou... Closed
Operating System: ALL
Steps To Reproduce:

load("jstests/core/txns/libs/prepare_helpers.js");
 
const name = "prepare_counts";
const rst = new ReplSetTest({nodes: 1});
const nodes = rst.startSet();
rst.initiate();
const dbName = "test";
const collName = name;
 
const primary = rst.getPrimary();
const secondary = rst.getSecondary();
const testDB = primary.getDB(dbName);
 
assert.commandWorked(testDB.runCommand({create: collName, writeConcern: {w: "majority"}}));
 
const session = primary.startSession({causalConsistency: false});
const sessionDB = session.getDatabase(dbName);
const sessionColl = sessionDB.getCollection(collName);
 
jsTestLog("Inserting 9 documents");
for(var i=0;i<9;i++){
    assert.commandWorked(primary.getDB(dbName).getCollection(collName).insert({x:i}));
}
 
jsTestLog("Starting transaction.");
session.startTransaction();
var txnNum = session.getTxnNumber_forTesting();
assert.commandWorked(sessionColl.insert({_id:1}));
assert.commandWorked(sessionColl.insert({_id:2}));
jsTestLog("Preparing transaction.");
PrepareHelpers.prepareTransaction(session);
 
jsTestLog("Validating collection.");
assert.commandWorked(primary.adminCommand({replSetStepDown: 5, force: true}));
assert.commandWorked(testDB.runCommand({validate: collName, full: true}));
 
jsTestLog("Done validating collections");
var session1 = PrepareHelpers.createSessionWithGivenId(rst.getPrimary(), session.getSessionId());
var session1DB = session1.getDatabase(dbName);
 
jsTestLog("Aborting transaction.");
assert.commandWorked(session1DB.adminCommand(
    {abortTransaction: 1, txnNumber: txnNum, stmtid: NumberInt(3), autocommit: false}));
 
rst.checkCollectionCounts();
rst.stopSet();

Sprint: Execution Team 2019-12-30, Execution Team 2020-05-04, Execution Team 2020-05-18, Execution Team 2020-06-01
Participants:
Linked BF Score: 0

 Description   

When we run collection validation we will update the "fast count" value for a collection, based on how many documents we observe when scanning the collection. If we run validation against a node that currently has an open, prepared transaction that has done operations against the collection we are validating, we may incorrectly update the fast count value since validation cannot observe the effects of the uncommitted transaction. For example, if we have done 10 inserts against collection A, a prepared transaction has done 10 inserts against collection A, and then we run validate against A, we will update the fast count to 10, since we do not see the effects of the transaction yet. If the transaction later aborts, though, we will decrement the fast count by 10, leaving us with a fast count of 0 indefinitely, even though there are 10 documents in the collection.



 Comments   
Comment by Henrik Edin [ 21/May/20 ]

The root cause of this problem is the need to repair fast count in validate, which we need to do as it can become incorrect after unclean shutdown. We should tackle this at its root cause which is PM-1820.

After that is fixed the need to implement correct isolation for fast count becomes less important and should be handled in a separate ticket. If we want to add fast count support to multi-document transaction for example.

Comment by Eric Milkie [ 07/Jan/20 ]

After some discussion, we believe the correct solution for this will be to implement correct isolation for collection count and size, similar to the method we used to do collection creation isolation in multi-document transactions. We can set up a local map of collection name to count&size, for each collection opened by a transaction, and adjust those numbers locally as the transaction proceeds. At commit time, we would then adjust the actual Collection sizes and counts.
This would fix the problem with validate, since prepared transactions would have no effect on collection counts until they actually commit.

Comment by William Schultz (Inactive) [ 26/Nov/19 ]

Ok, I spoke too soon. I can repro this on master (403f3b000), using the attached repro, so this doesn't seem limited to 4.2.

Comment by William Schultz (Inactive) [ 26/Nov/19 ]

This may have only been made possible when we allowed validate to ignore prepare conflicts i.e. SERVER-41888. milkie pointed out that PM-822 recently changed the behavior of validation more significantly, which may explain why this is only appearing on 4.2.

Generated at Thu Feb 08 05:07:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.