Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-66588

In catalog shard POC the config secondary should be prevented writing

    • Sharding NYC
    • 4

      The quick fix I did for SERVER-66224 was fixing tests but in general case it's wrong. Max said:
       
      "Brett and I have been debugging an issue and learned that secondaries in the config server replica set attempt to extend the lease of the distributed lock. The writes the secondaries do are through ShardLocal so they end up failing with NotWritablePrimary - https://github.com/mongodb/mongo/blob/3805148358ae9b82e5f3b9307bd25fbf7a4dd4b5/src/mongo/db/s/dist_lock_catalog_replset.cpp#L206-L215I haven't been following the ShardLocal / ShardRemote / ShardConfig but would like to make certain we forbid secondaries from contacting the config server primary and extending the lease. Only the primary of the replica set should ever be doing the distributed lock pinging so fixing that may be the ultimate preferred solution
       
      dist_lock_catalog_replset.cpp
      Status DistLockCatalogImpl::ping(OperationContext* opCtx, StringData processID, Date_t ping) {
         auto request = write_ops::FindAndModifyCommandRequest(_lockPingNS);
         request.setQuery(BSON(LockpingsType::process() << processID));
         request.setUpdate(write_ops::UpdateModification::parseFromClassicUpdate(
             BSON("$set" << BSON(LockpingsType::ping(ping)))));

      I'm saying it we should ideally prevent secondaries from pinging the distributed lock. Secondaries aren't authoritative
       
      At minimum the writes the secondaries attempt to do today must still happen locally (and thus fail with NotWritablePrimary) if they are going to happen at all
       
      Yes normally https://github.com/mongodb/mongo/blob/5dff90ff1e8a672a8716f0c9c936f8f50e56fd0b/src/mongo/db/repl/oplog.cpp#L367 would abort the local storage transaction on the secondary because config.lockpings is a replicated collection
       
      oplog.cpp
             uasserted(ErrorCodes::NotWritablePrimary, ss);
      The specific case I'm worried about is secondary node in the catalog shard wants to ping the distributed lock so it contacts the current primary of the catalog shard. Instead it be the exclusive responsibility of the primary of the shards to do that pinging
       
      Today on the CSRS the secondary node in the CSRS wants to ping the distributed lock so it tries to write to config.lockpings locally and gets a NotWritablePrimary error"
       

            Assignee:
            backlog-server-sharding-nyc [DO NOT USE] Backlog - Sharding NYC
            Reporter:
            andrew.shuvalov@mongodb.com Andrew Shuvalov (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: