Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-50697

Do not enforce fast count on 'config.system.indexBuilds'

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.9.0
    • Affects Version/s: None
    • Component/s: Index Maintenance, Storage
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Execution Team 2020-11-16
    • 70

      This is related to a build failure on a two-node replica set where we build an index and then restart each node individually in standalone mode and encounter an incorrect fast count.

       

      More precisely, this is the timeline of events leading up to the incorrect fast count:

      1. The primary node starts an index build {x: 1} with UUID f50b6510-8a48-4143-b8f4-fc318d8cbd2a.

      2. The secondary starts building the index too, and eventually, the commit quorum is satisfied.

      3. Both nodes finish the index build and the commit timestamp for both the primary and secondary is Timestamp(1596309184, 9).

      4. Both nodes are shutdown, as they will be started up in standalone mode.

      5. The primary node is starting up in standalone mode with recoveryTimestamp Timestamp(1596309184, 8) and it sees that it has a single unfinished index in the 'config.system.indexBuilds' collection: "Found index from unfinished build".

      However, during shutdown, the collections are validated and the 'config.system.indexBuilds' fast count is incorrect. 
      "fast count (0) does not match number of records (1) for collection 'config.system.indexBuilds'"

      with the contents of the collection being:

      {
      	"_id" : UUID("f50b6510-8a48-4143-b8f4-fc318d8cbd2a"),
      	"collectionUUID" : UUID("afdc996d-0602-47e5-adbb-ee3e02809050"),
      	"commitQuorum" : "votingMembers",
      	"indexNames" : [
      		"x_1"
      	],
      	"commitReadyMembers" : [
      		"...:20270",
      		"...:20271"
      	]
      }
      

       

      From what I can tell, it looks like the fsync performed when shutting down the nodes in step 4 synced the fast count data "too early" to disk when step 3 removed the index entry from the 'config.system.indexBuilds' collection that was committed after the recoveryTimestamp.

      Restarting replica set members in standalone mode makes the database contents prone to appear inconsistent with the writes that were visible when this node was running as part of a replica set, so perhaps we can just turn off the fast count validation for the 'config.system.indexBuilds' in 'absent_ns_field_in_index_specs.js'.

            Assignee:
            gregory.wlodarek@mongodb.com Gregory Wlodarek
            Reporter:
            gregory.wlodarek@mongodb.com Gregory Wlodarek
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: