[DOCS-15586] [Server] Improve NaN-handling for expireAfterSeconds TTL index parameter Created: 30/Aug/22  Updated: 13/Nov/23  Resolved: 30/Nov/22

Status: Closed
Project: Documentation
Component/s: manual
Affects Version/s: None
Fix Version/s: 6.0.2, 6.2.0-rc0, 6.1.0-rc1, 5.0.14, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Dave Cuthbert (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
backported by DOCS-15623 [BACKPORT] [v6.0] Improve NaN-handlin... Backlog
Depends
is depended on by DOCS-15676 [Server] [BACKPORT] [v5.0] Improve Na... Backlog
Documented
documents SERVER-68477 Improve NaN-handling for expireAfterS... Closed
Participants:
Days since reply: 1 year, 8 weeks, 2 days ago
Epic Link: DOCSP-22091

 Description   

From: Investigate changes in SERVER-68477:


Original Downstream Change Summary

This is a good opportunity to review/clarify expireAfterSeconds in the docs:

In MongoDB 4.4 and earlier, TTL indexes with an expireAfterSeconds value of NaN (not-a-number) were functional as indexes for queries but did not expire any documents. When such a config is in place, the following error is logged by MongoDB 4.2 and 4.4 primary replica set nodes:

2022-08-05T21:00:00.000+0000 E  INDEX    [TTLMonitor] Error processing ttl index: { v: 2, key: { created: 1 }, name: "created_1", ns: "dbname.collname", expireAfterSeconds: nan.0, background: true } -- DurationOverflow: Cannot negate the minimum duration

In MongoDB 5.0 and 6.0, as part of SERVER-56676, TTL indexes began treating NaN as 0, and the listIndexes command necessary for initial syncs and mongodump backups began returning 0 instead of NaN.

Because of this change in unspecified behavior, when a TTL index with this improper configuration exists, the following can trigger the sudden expiration of TTL-indexed documents in a collection:

  • On MongoDB 4.4, when:
    • upgrading to MongoDB 5.0
    • initially syncing from a 5.0 or 6.0 node
  • On MongoDB 4.2, when initially syncing from a 5.0 or 6.0 node
  • On MongoDB 5.0 or 6.0
    • When restoring from a mongodump of a 4.2 or 4.4 collection that has a TTL configured with expireAfterSeconds: NaN
    • When initially syncing from a version 4.2 or 4.4 node that has a TTL configured with expireAfterSeconds: NaN

WORKAROUNDS AND REMEDIATION

In general, avoid this issue by avoiding expireAfterSeconds: NaN as a configuration and correct this config anywhere it exists.

The following script for the mongosh shell reports any TTL indexes with an expireAfterSeconds: NaN configuration:

Note: Do not use the legacy mongo shell for this operation.

function getNaNIndexes() {
  const nan_idx = [];
 
  const dbs = db.adminCommand({ listDatabases: 1 }).databases;
 
  dbs.forEach((d) => {
    const listCollCursor = db
      .getSiblingDB(d.name)
      .runCommand({ listCollections: 1 }).cursor;
 
    const collDetails = {
      db: listCollCursor.ns.split(".$cmd")[0],
      colls: listCollCursor.firstBatch.map((c) => c.name),
    };
 
    collDetails.colls.forEach((c) =>
      db
        .getSiblingDB(collDetails.db)
        .getCollection(c)
        .getIndexes()
        .forEach((entry) => {
          if (Object.is(entry.expireAfterSeconds, NaN)) {
            nan_idx.push({ ns: `${collDetails.db}.${c}`, index: entry });
          }
        })
    );
  });
 
  return nan_idx;
};
getNaNIndexes();

Once identified, correct any TTL indexes with the expireAfterSeconds: NaN configuration and establish an unambiguous, valid configuration with a specified behavior. The collMod command allows you to modify the expireAfterSeconds value for an existing index.

MongoDB intends to help protect against this behavior change by:

  • In this ticket, SERVER-68477, future MongoDB 5.0 and 6.0 versions will render badly configured TTL indexes ineffective, rather than applying a meaning of expireAfterSeconds: 0 to these indexes.
  • Releasing SERVER-68522 in MongoDB 5.0.11. With this fix, MongoDB 5.0.11+ will refuse to start if a TTL index with expireAfterSeconds: NaN exists, to ensure that the normal upgrade path from 4.4 is protected from this unexpected change in behavior. See SERVER-68522 for additional details.
Original description

Currently listIndexes, and subsequently initial sync, do not properly handle NaN values for expireAfterSeconds. This can result in unexpected TTL behavior, especially when upgrading from MongoDB 4.4 to MongoDB 5.0 or when migrating earlier index definitions to MongoDB 5.0 or 6.0.



 Comments   
Comment by Githook User [ 12/Dec/22 ]

Author:

{'name': 'Dave Cuthbert', 'email': '69165704+davemungo@users.noreply.github.com', 'username': 'davemungo'}

Message: DOCS-15586 expireAfterSeconds (#2136)

  • Review feedback
  • Review feedback
  • Upstream doc updated
  • Staging updates
Comment by Githook User [ 29/Nov/22 ]

Author:

{'name': 'Dave Cuthbert', 'email': '69165704+davemungo@users.noreply.github.com', 'username': 'davemungo'}

Message: DOCS-15586 Backport (#2204)

  • Review feedback
  • Review feedback
  • Upstream doc updated
  • Staging updates
  • Review feedback
Comment by Githook User [ 29/Nov/22 ]

Author:

{'name': 'Dave Cuthbert', 'email': '69165704+davemungo@users.noreply.github.com', 'username': 'davemungo'}

Message: DOCS-15586 expireAfterSeconds (#2136) (#2203)

  • Review feedback
  • Review feedback
  • Upstream doc updated
  • Staging updates
Comment by Githook User [ 28/Nov/22 ]

Author:

{'name': 'Dave Cuthbert', 'email': '69165704+davemungo@users.noreply.github.com', 'username': 'davemungo'}

Message: DOCS-15586 expireAfterSeconds (#2136)

  • Review feedback
  • Review feedback
  • Upstream doc updated
  • Staging updates
Comment by Eric Sedor [ 21/Nov/22 ]

Thanks for catching this, dave.cuthbert@mongodb.com:

  • The missing phrase in my comment above was "initial sync"; that is present on SERVER-68477
  • The "exception" addition in SERVER-68477 should indeed have included 4.4 and I've added that.

This exception refers to a "non-standard" downgrade path where you start a new node on 4.2/4.4 and add it to an otherwise 5.0/6.0 replica set (rather than downgrading binaries in place on existing replica set nodes). Does that clarify? FWIW I would suggest against documenting this path in detail as it's not our recommended downgrade method even in the absence of bugs.

Comment by Eric Sedor [ 18/Nov/22 ]

Commenting here at dave.cuthbert@mongodb.com's request:

SERVER-68477 is the authoritative source of the current state of the system around this bug. The changes we made recently as part of STAR-3043 relate to quantifying the exact versions on which to expect different behaviors. Generally, where we said "5.0 or 6.0" before we knew when SERVER-68477 would land, we now say "5.0.0-5.0.13 or 6.0.0-6.0.1". This is because of how
SERVER-68477 changed behavior in 5.0.14 and 6.0.2.

The notable exception to this is that we also added the "Now that this issue is addressed, the following case may still present concern..." section in the updated green summary box on SERVER-68477. This covers what happens when you a initial sync a 4.2 node from a 5.0.14+/6.0.2+ node.

Hopefully this makes sense!

Comment by Education Bot [ 12/Oct/22 ]

Fix Version updated for upstream SERVER-68477:
6.0.2, 6.1.0-rc1, 6.2.0-rc0, 5.0.14

Comment by Education Bot [ 12/Sep/22 ]

Fix Version updated for upstream SERVER-68477:
6.0.2, 6.1.0-rc1, 6.2.0-rc0

Comment by Education Bot [ 31/Aug/22 ]

Fix Version updated for upstream SERVER-68477:
6.1.0-rc1, 6.2.0-rc0

Comment by Education Bot [ 30/Aug/22 ]

Fix Version updated for upstream SERVER-68477:
6.2.0-rc0

Generated at Thu Feb 08 08:13:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.