Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-25

generational number overflow



    • Icon: Task Task
    • Resolution: Done
    • WT1.0
    • None
    • None


      What are your thoughts on 32-bit generational numbers and incremental overflow?

      Obviously, 64-bits is safe forever, we'll never overflow them, but they're not necessarily atomically written.

      32-bits isn't safe forever: at an update a second, it's 136 years, at 100 updates a second it's 16 months, and at 1000 updates a second it's just over 6 weeks.

      Right now I'm thinking about the cache page LRU value.

      A few solutions come to mind:

      (1) Stay with 32-bits – I think this is probably safe. Yeah, 100 page reads a second isn't unreasonable, but 100 page reads a second for 16 months, that's vanishingly unlikely.

      (2) Go to 64-bits – the page's LRU doesn't have to be exact, if we read garbage, it just means we try and evict a page we might not otherwise evict, no harm done. It is 4 more bytes on every page, which makes me sad.

      (3) Build in some kind of "every thread out of the library, reset all the counters" functionality that fires once a year (and, obviously, gets tested regularly). This may be necessary for other reasons, but it's not going to be possible to review every page in the cache to reset its LRU – so this doesn't fix the cache LRU problem.

      (4) Do resets on an ad-hoc basis, testing the 32-bit counters for overflow on every operation and resetting them as necessary – I think this is what BDB does. This may be necessary for other reasons, but again, it's not going to work for the cache LRU value, there are too many pages to reset.

      (5) Slow the counter increments. Imagine that we put in a thread of control that wakes periodically, and does certain events. If that thread woke once a second and updated the LRU cache value, which is then copied into each page's LRU counter, then 32-bits is safe forever. The down-side is our LRU granularity goes down, and all pages accessed within the space of a second have the same LRU value, but I don't see much of a problem with that.




            keith.bostic@mongodb.com Keith Bostic (Inactive)
            wiredtiger WiredTiger
            0 Vote for this issue
            0 Start watching this issue