[SERVER-77326] Change usage of .count() to .itCount() in transaction_reaper.js Created: 19/May/23  Updated: 16/Jun/23  Resolved: 16/Jun/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Sanika Phanse (Inactive) Assignee: Max Hirschhorn
Resolution: Won't Do Votes: 0
Labels: intern, new-eng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Sharding NYC
Operating System: ALL
Participants:
Linked BF Score: 5

 Description   

From the docs: When you call count() without a query predicate, you may receive inaccurate document counts. Without a query predicate, count() methods return results based on the collection's metadata, which may result in an approximate count.

Change usage of .count() to .itCount() for all usages in transaction_reaper.js.



 Comments   
Comment by Sanika Phanse (Inactive) [ 16/Jun/23 ]

Hey max.hirschhorn@mongodb.com, thanks for detailing the cases in which fast count is inaccurate. I was not aware of the cases that cause it to be inaccurate, just that it can. For the related BF, I had seen in the datafiles that the 1500 documents were written to `test.test`, but given this two hour delay, it makes sense that the session records would've been reaped. I agree with your analysis.

Comment by Max Hirschhorn [ 13/Jun/23 ]

Hey sanika.phanse@mongodb.com, I wanted to check in about the ways fast-count can be inaccurate because I wonder if there is a different issue happening in the Evergreen failure.

  • If the fast-count is queried in the midst of concurrent writes then the returned count is approximate.
  • The fast-count value is accurate after all writes to the collection has quiesced.
  • The fast-count value is accurate across replication rollback and clean shutdown.
  • If there is an unclean shutdown then the fast-count value will be permanently diverge from the true value until the validate command is run for the collection.

The transaction_reaper.js test would have previously asserted the fast-count for the config.system.sessions collection is equal to nSessions on line 94 (the failing assertion was line 104). This suggests the documents being absent from the config.system.sessions collection is because they were deleted.

92
this.refresh();
93
this.assertOutstandingTransactions(0);
94
this.assertOutstandingSessions(nSessions);
95
 
96
for (var i = 0; i < nSessions; i++) {
97
    // make a session and get it to the collection
98
    var session = this.sessions[i];
99
    assert.commandWorked(session.getDatabase("test").test.save({a: 1}));
100
}
101
 
102
// Ensure a write flushes a transaction
103
this.assertOutstandingTransactions(nSessions);
104
this.assertOutstandingSessions(nSessions);

I think the underlying issue may be hardware related (it is known to happen with the MacStadium machines). I see a total of 1500 (== nSessions) "New transaction started" log messages. One notable detail is there is a 2-hour gap from 14:25:15 to 16:26:21. The mongod process was still active as seen from it taking new checkpoints each minute for the 2 hours. I don't know why the mongo shell process seemingly paused in running the contents of the test. Since the config.system.sessions documents expire after 30 minutes by default (== logicalSessionTimeoutMinutes) the previously started logical sessions would have long had their documents deleted from config.system.sessions collection. Additionally, the later executed retryable writes wouldn't have had a document written to the config.system.sessions collection yet because the refreshLogicalSessionCacheNow command to flush the in-memory LogicalSessionCache the config.system.sessions collection isn't called again (via Fixture.prototype.refresh()) between lines 94 and 104.

[js_test:transaction_reaper] d21270| 2023-05-04T14:25:15.830-04:00 D4 TXN      23984   [conn1] "New transaction started","attr":{"txnNumber":0,"lsid":{"uuid":{"$uuid":"fc94114b-3407-4f1a-9dd0-eaf57c8da265"}},"apiParameters":{}}
[js_test:transaction_reaper] d21270| 2023-05-04T14:25:15.840-04:00 D4 TXN      23984   [conn1] "New transaction started","attr":{"txnNumber":0,"lsid":{"uuid":{"$uuid":"a762a664-3b3b-4385-8da8-767b378632a0"}},"apiParameters":{}}
[js_test:transaction_reaper] d21270| 2023-05-04T14:25:15.847-04:00 D4 TXN      23984   [conn1] "New transaction started","attr":{"txnNumber":0,"lsid":{"uuid":{"$uuid":"c1a4977e-b4ff-4604-95c5-c9fc8797194b"}},"apiParameters":{}}
...
[js_test:transaction_reaper] d21270| 2023-05-04T16:26:21.077-04:00 D4 TXN      23984   [conn2] "New transaction started","attr":{"txnNumber":0,"lsid":{"uuid":{"$uuid":"3d08ad51-da56-4dda-b797-f993f2daa570"}},"apiParameters":{}}
[js_test:transaction_reaper] d21270| 2023-05-04T16:26:21.086-04:00 D4 TXN      23984   [conn2] "New transaction started","attr":{"txnNumber":0,"lsid":{"uuid":{"$uuid":"64692630-df28-4e80-8d1c-de4a2e9e4311"}},"apiParameters":{}}
[js_test:transaction_reaper] d21270| 2023-05-04T16:26:21.095-04:00 D4 TXN      23984   [conn2] "New transaction started","attr":{"txnNumber":0,"lsid":{"uuid":{"$uuid":"4dafdb1f-a93c-4756-b431-4c446acefc73"}},"apiParameters":{}}

https://parsley.mongodb.com/resmoke/7144f76be8a1d36fa21ead32f9ed72cb/test/175c03f1298a4f2f4d5f3090f2b2e4de?bookmarks=0,31,1774,3159,3405,3522,4161&shareLine=0

Generated at Thu Feb 08 06:35:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.