[SERVER-48050] FCV should be initialized before attempting to restart in-progress index builds Created: 08/May/20  Updated: 29/Oct/23  Resolved: 20/Jul/20

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Bug Priority: Major - P3
Reporter: Bernard Gorman Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Gantt Dependency
has to be done before SERVER-49301 remove startingAfterUncleanShutdown d... Closed
Related
related to SERVER-48044 FCV::isVersion should invariant if FC... Closed
is related to SERVER-49775 Fix uninitialized FCV check when star... Closed
is related to SERVER-50139 Skip level upgrade with --repair leav... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Execution Team 2020-07-27
Participants:
Linked BF Score: 6

 Description   

When a mongod restarts and attempts to rebuild its unfinished indexes, it does so before the FCV version has been initialized. When the in-progress index specs are validated, therefore, the index comparison defaults to the FCV 4.4 behaviour and does not distinguish the indexes based on their differing partialFilterExpressions. This will cause an fassert if any conflicting indexes exist.

Similar issues have occurred before; see e.g. SERVER-45374. After speaking with louis.williams and daniel.gottlieb, the consensus is that the proper solution would be to ensure that the FCV is always initialized before attempting any index validation.



 Comments   
Comment by Githook User [ 20/Jul/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-48050 FCV should be initialized before attempting to restart in-progress index builds

  • Refactored repairDatabaseAndCheckVersion by moving FCV logic into
    the FeatureCompatibilityVersion class
  • Refactored repairDatabaseAndCheckVersion by separating procedures of
    regular recovery, read-only mode, and repair
  • Renamed repairDatabaseAndCheckVersion to repairAndRecoverDatabases
  • Moved startup recovery/repair free function into a 'startup_recovery' namespace
  • Moved repair free functions into a 'repair' namespace.

renamed: src/mongo/db/repair_database_and_check_version.cpp -> src/mongo/db/startup_recovery.cpp
renamed: src/mongo/db/repair_database_and_check_version.h -> src/mongo/db/startup_recovery.h
renamed: src/mongo/db/repair_database.cpp -> src/mongo/db/repair.cpp
renamed: src/mongo/db/repair_database.h -> src/mongo/db/repair.h
Branch: master
https://github.com/mongodb/mongo/commit/d0e0220787fa127e0fae9d0c9f691316bd1eb6db

Comment by Siyuan Zhou [ 05/Jul/20 ]

Thanks bernard.gorman for the explanation of the use case. It is very helpful!

Comment by Bernard Gorman [ 04/Jul/20 ]

siyuan.zhou, I think the comment you linked has caused some confusion:

// The partialFilterExpression is only part of the index signature if FCV has been set to 4.6.

This does not mean that the partialFilterExpression parameter is only available in FCV 4.6 - partial indexes were added in 3.2. Rather, it means that in FCV 4.6 partialFilterExpression has been made part of the index signature; that is, the subset of parameters that uniquely identify an index.

Prior to 4.6, only the keyPattern and collation were part of the index signature. As a result, the following two indexes could not coexist, as they were considered to conflict with each other:

{keyPattern: {a: 1}, partialFilterExpression: {a: {$gte: 0}}}
{keyPattern: {a: 1}, partialFilterExpression: {a: {$lt: 0}}}

In 4.6, multiple indexes which differ only by partialFilterExpression are allowed to coexist together. The FCV is actually used to enforce exactly the behaviour you mentioned above:

  • You can only create indexes that differ by partialFilterExpression in FCV 4.6, since on 4.4 these would be considered invalid duplicate indexes.
  • But if you downgrade the FCV to 4.4 while such an index is present, we will continue to support it. The planner can still use these indexes to answer queries that fall within the partialFilterExpression.

See this test which exercises the behaviour described above. The problem identified in this ticket is that, if the node restarts while an index is mid-build, it will re-check its signature against all existing indexes before resuming the build - but the FCV has not been set at this point, and so we interpret this as an attempt to create a conflicting index while in FCV 4.4 and throw.

Comment by Siyuan Zhou [ 02/Jul/20 ]

bernard.gorman, as discussed with lingzhi.deng and jason.chan in SERVER-48044, I'm curious why FCV is needed to decide whether to support a feature. As the comment says, when the index isPartial(), it must have been set to 4.6. Is it possible to support the feature without checking FCV?

I'm attempting to understand the best practice of using FCV. My understanding is that upgrade / downgrade is done in two phases.

  1. Support both old and new behaviors by upgrading the binary.
  2. Enable the new behavior actively by setting a new FCV.

As an example of (1), in replication, we tend to support a feature whenever it's requested by another node regardless of FCV. For example, a newly introduced field of a request in 4.4 should be respected even if the 4.4 node is on 4.2 FCV. This mixed-version scenario is inevitable on upgrade of replset members. As an example of (2), user-requested behavior only enables a feature on FCV 4.4.

The usage of FCV in this ticket seems different from that. I'd really appreciate your insights on that. With that being said, I totally agree FCV should be initialized as early as possible regardless.

Generated at Thu Feb 08 05:15:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.