[SERVER-41410] Prototype drain mode concepts Created: 30/May/19  Updated: 08/Jan/24  Resolved: 03/Aug/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Benjamin Caimano (Inactive) Assignee: Backlog - Service Architecture
Resolution: Won't Fix Votes: 0
Labels: mongos-drain-mode-fallout
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Service Arch
Sprint: Service Arch 2019-06-17
Participants:

 Comments   
Comment by Benjamin Caimano (Inactive) [ 20/Jun/19 ]

So far, I've been able to produce a Drain Mode concept that rejects user CRUD operations but accepts getMores with the following implementation steps:

  • Add InDrainMode to the set of error codes. It is in the NotMasterError error class and results in the InDrainMode error label.
  • Add a DrainModeSupervisor class that lives as a ServiceContext decoration. This class keeps a simple synchronized state and provides helper methods to verify CommandInvocation pointers. It is what is used to check if drain mode is active.
  • Add enterDrainMode and exitDrainMode commands that modify drain mode. These commands had to be added to way too many lists in our JS test suite.
  • Call a DrainModeSupervisor helper method in src/mongo/db/service_entry_point_common.cpp. This uasserts and fails out user commands with non-local read or write concern when drain mode is active.
  • Check the drain mode in a few different places in repl code to set the status of isMaster response to code: InDrainMode with the right error label.
  • Permanently set drain mode on as the first action of shutdown tasks in mongod/mongos.

Surprisingly, this has worked for normal patches and for hand testing of CRUD. I suspect that I'll need to design a more thorough test suite for the non-shutdown CRUD behavior.

Generated at Thu Feb 08 04:57:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.