Secondary nodes will have incorrect fast size and count for collections that were populated on startup

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.3.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • Fully Compatible
    • ALL
    • Storage Execution 2026-03-02
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Take a replica set that has started up before (warm boot) and has some populated collections. Let's say collection A is one such collection, and that it has a starting size and count of (size: 100, count 10).

      Let's say node0 steps up as primary - it will see that collection A has (size: 100, count 10) and populate its own in-memory _metadata map to reflect this. We then delete a document from collection A, so we commit a change of (sizeDelta: -10, countDelta: -1) bringing us to a committed value of (size: 90, count: 9).

      Then let's say that node1 is a secondary in the same replica set. It will see this write to collectionA and apply the change of (sizeDelta: -10, countDelta: -1). However, because it did not perform the same seeding step as the primary, it has the default value of (size: 0, count: 0) in its own in-memory _metadata map for collection A. It then applies this change which brings its committed values for collection A on node1 to (size: -10, count: -1), which triggers the invariant around having a non-negative size and count.

      Also importantly, the broader implication of this problem is that for any collection that we seed at startup will have incorrect size and count on secondary nodes.

      We have to have secondaries account for these collections in their own _metadata values

            Assignee:
            Damian Wasilewicz
            Reporter:
            Damian Wasilewicz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: