[COMPASS-5353] Compass not showing full count on Online Archive Created: 03/Dec/21  Updated: 31/May/23

Status: Waiting (Blocked)
Project: Compass
Component/s: Count
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Yuta Arai Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on COMPASS-5495 Display some visual indications that ... Open
Related
related to COMPASS-4387 Report document count and total size ... Closed
Sprint: Up for triaging

 Description   

Problem Statement/Rationale

What is going wrong? What action would you like the Engineering team to take?
In HELP-29658, we found that the count displayed on Compass for Online Archive was not accurate. There should have been ~1million documents archived but only 22,000 was being displayed. We narrowed down that this was a Compass issue by having the customer run .count() on the shell where it returned the expected 1 million.

Steps to Reproduce

How could an engineer replicate the issue you’re reporting?
1. Create M10 cluster
2. Load sample data set
3. Create online archive on sample_airbnb.listingsAndReviews with last_review as date field
4. After a few minutes connect to the Online Archive via Compass to verify count is different when you connect via the shell

Expected Results

What do you expect to happen?
Count from Compass is the same as count in the shell

Actual Results

What do you observe is happening?
They are not the same.

Additional Notes

Any additional information that may be useful to include.



 Comments   
Comment by Rhys Howell [ 22/Mar/22 ]

yuta.arai
What's happening here is that Online Archive caches responses to the collStats command. Compass uses collStats to show the estimated document count and storage size you're seeing on the collections page. So this behavior is expected since the TTL of the cache is 7 days.
If you'd like to fetch the collStats information and bypass the cache you can append the sync: true option a the function call in mongosh, MongoDB's shell. Doing this can have fairly significant costs - (it ends up doing a full coll scan of the data lake, which can rack up AWS fees depending on the structure + size of the data).
We're currently chatting with the team if we want to show some more information here to users to inform them of this behavior when connected to Atlas Online Archive, and maybe even show actions around refreshing the cache. It's definitely not transparent currently, sorry for any confusion.
More info on the collStats behavior: https://docs.mongodb.com/datalake/supported-unsupported/diagnostic-commands/#collstats

Generated at Wed Feb 07 22:39:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.